Posts Tagged ‘Data’
IT Jargon in Group
APIs are part of an application that communicates with other applications. APIs can be used to enable microservices. APIs enable a digital transformation strategy by Sharing business capabilities in a partner ecosystem, Unlocking new business channels, Creating customer value.
Model–view–controller, traditionally used for desktop graphical user interfaces (GUIs), this pattern became popular for designing web applications
Facebook Group:
WEB FRAMEWORK (Web CMS Framework to Web Application Framework)
Drupal Indonesia (Jun 2008), 4.3K
WordPress Indonesia (Mar 2012), 73K – Three Tier (not MVC)
CodeIgniter Indonesia (Mar 2009, name changed Feb 2020), 100K
Laravel Indonesia (April 2020), 17K
Ruby on Rails Indonesia (Dec 2011), 5.5K
BACK END DEVELOPMENT
PHP Indonesia (Aug 2008), 164K
Python Indonesia (Nov 2012), 39K
Node.js Indonesia (May 2012), 27K
Javascript Programmer Indonesia (Feb 2018, name changed Nov 2020), 14K
FRONT END DEVELOPMENT
ReactJS Indonesia (Aug 2015), 26K
Angular Indonesia (Aug 2013, name changed Mar 2017), 12K
VueJS Indoensia (Nov 2015), 19K
OTHERS
Odoo Indonesia (Aug 2014), 1.7K
MongoDB Indonesia (Mar 2012), 2.2K
AWS User Group Indonesia (Sep 2013), 3.2K
Taudata Analytics: Data Science, Big Data, IOT (Jan 2014, name changed Feb 2022), 19K
Hosting: Dreamhost, Hostgator
Kimball Nine Step Datawarehouse
Interesting the main Computer Science Textbook for Datawarehouse is from Kimball & Ross
Datawarehouse
1.Choose Process. Penentuan subject dari masalah yang sedang di hadapi. Tentukan proses penting dari kegiatan operasional.
2.Choose Grain. Grain merupakan data dari table fakta yang dapat di analisis. Memilih grain berarti menentukan apa yang di representasikan oleh record dalam table fakta.
3.Identity & Conform Dimension. Identifikasi dimensi dengan detail untuk mendeskripsikan sesuatu. Hubungan ini di buat dalam bentuk table.
4.Choosing Fact. Memilih fakta yang akan di gunakan. Masing-masing fakta memiliki data yang dapat di hitung untuk kemudian di tampilkan dalam bentuk laporan, graphic atau diagram.
5.Store Pre-calculation In Fact Table. Setelah fakta-fakta itu di pilih masing-masing fakta tersebut harus di kaji lagi untuk menentukan adanya peluang untuk di gunakan pra hitungan.
6.Rounding Out The Dimension Table. Mengembalikan fakta ke dalam table dimensi, teks itu harus intuitif dan mudah di mengerti oleh pengguna.
7.Decide Duration Of Database & Periodicity Of Update. Menentukan batas waktu dari umur data yang di ambil dan akan di pindahkan ke table fakta. Perhatikan tingkat akurasi yang di miliki oleh data histori.
8.Track Slowly Changing Dimensions. Perubahan dari dimensi table dimensi dapat di lakukan dengan 3 cara yaitu mengganti secara langsung pada table dimensi, membentuk record baru untuk setiap perubahan baru dan perubahan data yang membentuk kolom baru yang berbeda.
9.Decide Query Priorities & Query Mode.Tentukan periode proses ETL (extract, transform & load).
Data governance: A conceptual framework, structured review, and research agenda
Data governance: A conceptual framework, structured review, and research agenda
Rene Abraham, Johannes Schneider Jan vom Brocke 2019
Some paper is just very beautifully written
Data Science Jargon

This is very popular Data Science Venn Diagram from Drew Conway, somewhere in 2010. This is a beginning of new jargon. I still prefer Negnevitsky’s (2002) book as the best to explain this. I also found out that some jargon is shifted just like Business Intelligence become Business Analytics, while BI itself more with Vizualization and Reporting.
Data – Roadmap
I start to realize the biggest problem in the Data.
1.There is no perfect framework (DMBOK vs DCAM is no sufficient)

https://datacrossroads.nl/2018/12/02/data-management-metamodels-damadmbok2-dcam/
2.Multiperspective of Data

3.Different understanding of data
School of DB, School of Architecture
https://stackoverflow.com/questions/3873346/logical-model-versus-domain-model

Master Data: “A consistent and uniform set of identifiers and attributes that describe the core entities of the enterprise and are used across multiple business processes.” Gartner Data & Analytics Summit
MDM: “that describe the core entities of the enterprise” – Master data exists and should be identified within multiple data domains:
• Party – This data domain can extend to an individual customer in B2C or a business (even hierarchy) in a B2B context. In other contexts, this domain can be extended to Patients, Providers, Citizens, Suspects or any other domain that has similar attributes like name, contact information, and, most importantly, survivorship rules when one or more records are resolved into a single record that represents the truth of that entity. It can also extend to suppliers, although that can arguably be considered a distinctly different data domain, and many MDM vendors will treat it as such.
• Product – A critical data domain for commerce, catalog, and compliance efforts, product master data differs from customer data primarily in that while the volume of records may be lower, the complexity is higher, hierarchies are much more important, and collaborative authorship is key.
• Asset – This domain can extend to both physical assets, such as replacement parts in a warehouse, physical buildings, such as hotels or warehouses, or beds within a hospital, for example, or digital assets, such as images, videos or reviews that relate to another data entity like a product or a location.
• Location – A more nebulous, but no less critical data domain, location master data will usually be a geographical location that corresponds to another data domain (like an address for a retail store or location of an off-shore drilling site) or some other location reference that can be tracked, such as delivery trucks or IP addresses.
Data Domain: Big level block that used to define master data
Data – Maturity Ladder
https://datacrossroads.nl/2018/12/02/data-management-metamodels-damadmbok2-dcam/
Data warehouse sources are often operational or transactional systems. In these types of systems, the master data comes along for the ride when an event or transaction occurs, such as a change in product inventory levels or a customer making a purchase. MDM often incorporates all possible master data sources, including not only data associated with or generated by internal systems, but also external data.
Another major difference between MDM and data warehousing is that MDM focuses on providing the enterprise with a single, unified and consistent view of these key business entities by creating and maintaining their best data representations. While a data warehouse often maintains a full history of the changes to these entities, its current view represents the last update. Plus, each data warehouse update is applied to the current view without a re-assessment of how previous updates might change the best representation.
Matching and consolidating related records doesn’t typically occur in data warehousing. MDM, on the other hand, standardizes, matches and consolidates common data elements across all possible data sources for a subject area to iteratively refresh and maintain its best master record.
https://blogs.sas.com/content/datamanagement/2017/04/27/mdm-different-data-warehousing
Data – Masterdata, Reference Data and Metadata
Metadata (data about data)
– Data Dictionary is a detailed definition and description of data sets (tables) and their fields (columns) aka Technical Metadata
– Business Glossary/Data Glossary, is a list of business terms with their definitions. It defines business concepts for an organization or industry and is independent from any specific database or vendor aka Business Metadata
Master Data (describe entity, noun e.g customer, product, location)
– Reference data is concerned with classification and categorisation
– Master data is concerned with business entities.
Transaction Data (describe action, verb e.g buy)
Data Modeling Tools
Data Modeling
- Data Modeling Tools: DDL (Data Definition Language). SQLyog, PHPmyAdmin
- Lineage Tools. ETL (e.g Talend
- Data Profiling Tools. Data profiling is the process of reviewing source data, understanding structure, content and interrelationships, and identifying potential for data projects.
- Metadata Repository
- Data Model Pattern
- Industry Data Model (e.g Shared Information Data / The Information Framework SID) (e.g PPDM)
Basically an “advance” database
Data Governance Reading
I read several books about DG over the weekend.
The Key Governance:
– Data Quality (CDE identification: regulatory, finance, operation) Critical Data Element
– Metadata (CRUD, retention)
– (to some degree) Data Security
(Plotkins 2014):
– datawarehousing,
– master data management,
– data quality improvement,
– system development
– information security
Major roles and responsibilities of Business Data Stewards:
– Metadata: definitions, derivations, data quality rules, creation and usage rules.
– Stewardship and ownership: what they mean and levels of decision making.
– Data quality: what it means and establishing data quality levels in context.
(Ladley 2012): More general principles
(Soares 2014)
I found its interesting that Framework for Data can be based on DMBOK domain, data domain, organization domain, process domain, CDE, or Big data, at least six of them (Soares 2014). In practical terms, at least he mention about three:
– Data quality management
– Metadata management
– Security and privacy
DMBOK Data Governance Scope
– Creating and managing core Metadata
– Documenting rules and standards
– Managing data quality issues
– Executing operational data governance activities
Data quality management
Identify critical data elements.
Define business rules based on critical data elements.
Resolve data issues uncovered by data profiling.
Metadata management
Add and modify the definitions for business terms.
Associate business rules with business terms.
Associate reference data with business terms.
Associate business terms with table and column names.
Master data management
Identify matching attributes.
Create match rules.
Resolve duplicate suspects.
Add and modify hierarchies and groupings.
Reference data management (RDM)
Add, modify, and delete code values and code tables.
Map code values between code tables in different applications.
Security and privacy
Define sensitive data.
Flag sensitive data in the metadata repository.
Validate access by users to key systems.
Revalidate access by users to key systems.
Provide input into the acceptable use of data based on business needs, regulatory
compliance, and brand reputation.
Data quality management
Metadata management
Master data management
Reference data management (RDM)
Security and privacy