Anjar Priandoyo

Catatan Setiap Hari

Posts Tagged ‘Data

Data Management

leave a comment »

So many standards:

  • CRISP DM – Data Mining
  • DCAM® (the Data Management Capability Assessment Model) (v1 2014, v2 May 2019) https://edmcouncil.org/
  • ARMA – Generally Accepted Recordkeeping Principles
  • DAMA/DMBOK

Written by Anjar Priandoyo

September 14, 2020 at 3:48 pm

Ditulis dalam Science

Tagged with

Machine Learning and Big Data

leave a comment »

Interesting, some might claim that Machine Learning is part of Big Data.

Written by Anjar Priandoyo

Agustus 22, 2020 at 3:43 pm

Ditulis dalam Science

Tagged with ,

Data Analytics

leave a comment »

Data Analytics is the new IT, interesting  some company even claim they know about data analytics simply by using BI tools.

  1. https://powerbi.microsoft.com/en-us/downloads/ – Power BI Desktop
  2. https://www.tableau.com/en-gb/products/trial – Tableau Desktop
  3. https://www.qlik.com/us/trial/qlik-sense-business – Qlik Sense
  4. https://www.alteryx.com/designer-trial/free-trial – Alteryx Designer

Written by Anjar Priandoyo

Agustus 13, 2020 at 3:21 pm

Ditulis dalam Science

Tagged with

Data Framework

leave a comment »

Interesting, there are many options that can be choose for data management framework

 

DAMA-DMBOK by DAMA international and DCAM by Enterprise Data Management Council ref, ref

Written by Anjar Priandoyo

Juni 8, 2020 at 4:10 pm

Ditulis dalam Science

Tagged with

Big Data Ecosystem

leave a comment »

bigdata-eco
Big Data Ecosystem (Cui, 2020) ref

  • Data collection and ingestion: log data collection (Flume), bulk data collection from a relational database (Sqoop), distributed messaging system (Kafka), dataflow (NiFi);
  • Computing engines: batch processing (MapReduce), iterative/near real-time processing (Spark, Flink), real-time processing/streaming (Storm, Flink);
  • Database: Relational database (RDBMS) has a standard schema but without scalable capabilities (MySQL, Oracle DB, SQL server, ProgresSQL); NoSQL database does not have a standard schema and has scalable capability (four types NoSQL: Column-based: HBase; Document-based: MongoDB; Key-value-based: Redis; Graph-based: Neo4j); NewSQL database is scalable relational database (VoltDB)), search engine (Solr, Elasticsearch);
  • Data analysis (BDA): Machine Learning (MLlib, Caffe, Tensorflow, Python), statistic (SparkR, R), OLAP
  • Data visualization (Zeppelin, Matplotlib, Tableau, D3, GraphX;
  • Workflow which is a scheduler of the jobs of various big data tools and dataflow which manages data transfer and data transformation among different big data tools: Oozie, Kepler, Apache NiFi;
  • Data management and KM: Apache Falcon, Apache Atlas, Apache Sentry, Apache Hive, Operation (Zookeeper, Ambari), Apache Griffin, Apache Ranger, Apache Jena;
  • Big data infrastructure (BDI): computing resources (general purpose computing and HPC), cluster management (YARN, Mesos), network communication (Software-Defined Networks (SDN), InfiniBand, 5G) etc.;
  • Big data security: Apache Metron, Apache Knox;

Written by Anjar Priandoyo

Juni 8, 2020 at 3:58 pm

Ditulis dalam Science

Tagged with

Big Data and Enterprise Architecture is politics

leave a comment »

Enterprise Architecture is politics

Pentaho: ETL Loading
PowerBI: Structured data visualization
NoSQL: Hadoop

The V’s of Big Data are volume, velocity, variety, veracity, valence, and value and each impacts data collection, monitoring, storage, analysis and reporting

Hadoop limitation, unsuitable for:
OLTP (Online Transaction Processing)
OLAP (Online Analytic Processing)
DSS (Decision Support System)

Big Data Roadmap
Dashboard/Descriptive: Management Reporting
Analytics/Predictive

ref, ref

Written by Anjar Priandoyo

Juni 8, 2020 at 2:31 pm

Ditulis dalam Science

Tagged with

Data: Data, Datawarehouse, Big Data

leave a comment »

Technically, nothing is changed from 2000s to 2020. First we have programmer, the one that is very good at logic (algorithm), finding a creative way to solve a problem. Second we have database, the one that is very good at structure (classification). Third we have Admin, the one that not very good at programming nor database, basically OS and Network guy.

The other non IT we have Sales (presales, sales engineer, account manager, RM), the one that very good at talking. Documentware, the one that very good with paper (auditor, security, consultant).

A datawarehouse stack (or suite) usually consists of three layers. These are usually referenced as ETL (loading), Database & Reporting (interface) ref. DWH for read, DB for write ref

A data mart is a subset of a data warehouse oriented to a specific business line

Notes:

  • ETL Tools (loading, integration): Informatica, Pentaho (open), Talend (open), Microsoft SSIS (SQL Server Integration Services), IBM Data Stage, BIRT
  • BI Tools: Tableu, Qlik, PowerBI, Cognos
  • Big Data Platform: Hive, Spark, Hadoop, MapReduce, Hortonworks, Watson Data Platform, Kafka, Presto, Storm, Samza, Cloudera, Greenplum
  • Machine Learning Tools: Python, SparkML, Mahout
  • Devops: JIRA, Confluence, Bitbucket, Bamboo, Jenkins, Artifactory, Git, Chef, Puppet, Ansible, Kubernetes, Docker, Nagios, Zabbix
  • Cloud Services: Amazon AWS, Google Cloud Platform, Azure
  • No SQL: Redis, MongoDB, CoucbDB, Cassandra, Eslasticsearch
  • Cloud DB: Amazon Relational Database, Microsoft Azure SQL Database, IBM Db2 on Cloud, Google Cloud SQL
  • Datawarehouse: Teradata, Amazon Redshift, Azure SQL Datawarehouse
  • Framework: DAMA DMBOK

Cloud Database: SQL Data Model vs NoSQL Data Model

Kafka quickly picked up early users at Airbnb, Box, Cisco, PayPal, Square and Yahoo ref

MongoDB is a cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with schema.

Job: Data Engineer, ETL Developer, Data Mining, Data Scientist, Solution Architect, Big Data, Data Analytics, Business Intelligence

DWH vs Big Data: DWH use query, big data use Hive, SparkSQL. DWH is architecture, Big Data is technology

Graphic & Ref ref

Written by Anjar Priandoyo

Mei 6, 2020 at 4:21 pm

Ditulis dalam Science

Tagged with

Computer Science Destiny

leave a comment »

Back in the day I am taking undergraduate in computer science, at that time we have a choice for our preference in computer science program. First we have information system & database (ISD), network & infrastructure (NIS), algorithm & artificial intelligence (AAI), multimedia & computer interaction. At that time, the most preference of specialization is on ISD because it assume that we will be working as programmer, so understanding of ISD is very important, maybe around 60-70% of choice is ISD. However ISD has a bad reputation, because it require a long night of sleepless because we need to develop something using JAVA/C/Delphi or other development software. For some people it is nightmare, but some people enjoy it, and majority rules.

The second preference is NIS, around 10-15% choose this, mainly the background of student that took this specialization is the one that like to overclock their computer, manage the server, perform computer installation. Basically a tough, strong, macho stereotype guys, well usually male. This group also prefer to work as system administrator. Also with the popularity of Linux and Cisco at that time, it is make sense to be a network guy. Multimedia student stereotype is mainly female, with interest in animation software like Macromedia/Adobe, require a sense of art. Well, at that time, this the smallest minority, due of lower supply, and the fact that multimedia is not a mandatory course, unlike ISD, NIS and AAI.

The fourth choice is AAI, well, AAI has reputation of course that has a low mark. It is difficult, both of theory and practicum class. The image of previous student that took this specialization is a smart one, and the lecturer in this field mainly a senior lecturer makes AAI is the lowest interest of specialization at that time. And at that time I got AAI as my specialization. Well, I cant remember clearly but its a combination of random selection given by unversity, in order to have balance supervisory meeting with the lecturer, but also combination of propose research. So if I am not mistaken, we propose a research proposal, and university will decide.

So at that time, during the announcement of who’s got who. Some student might cry, because his expected proposal ISD but his supervisor is AAI, and some student smile because his expectation and reality is align. But I think most of student is cry, because I dont know, statistically no one is happy with the announcement of which of supervisor that will help them to pass the thesis journey. The one that took a very long and painfull process.

As my destiny telling me that I should take an AAI path, which at that time the image of AAI does not have a real life practical implementation -well in my understanding no one use PROGOL and LISP back day, I am very afraid whether I can pass the thesis journey or not. So I begin to find what is the branch of AAI that has prospect in the future. The journey at that time begin with Expert System, as a branch of AI, where it has two branch, which are Inference Engine and Knowledge Base. At that time AI, well based on 1990s technology where internet has not been born yet, the focus of AI is on inference engine, on how the algorithm can be found to fasten the process efficiently. But the rise of knowledge base is underestimated due to development of internet as large scale information system is not mature yet.

So at that time, I decided to think forward by learning about Ontology, which is one of the representation of knowledge-base in object model. At that time object oriented programming is on the rise, as Java is popularized compare procedural programming such as C/Pascal/Basic. Nowadays, ontology and information theory research is growing, but at that time, no one know, and no one can predict that it will be popular in the future, even with Protege and staff that available at that time.

So if there is no future on ontology, semantic web and similar things at that time, at least in my understanding in Indonesia. Do I give up? of course not. I begin to expand the knowledge word in knowledge-base into new level, which are knowledge management (KM). In my assumption at that time, I think KM has bright prospect in the future. I often heard my senior works in one of largest bank in Indonesia, where he works as knowledge management senior manager, I also often heard of appointment of CKO (Chief Knowledge Officer), well a cool name along with CIO (Chief Information Officer). KM also promoted by many parties as solution of the rise of internet. The argumentation is with the information overloaded we need to organize in the knowledge jargon. Or argument that society based on knowledge will become very important and valuable.

So what is the reality? there is no prospect of conceptual ontology, no prospect of practical KM, and the only hope for newly graduate student to move further is to find the job, which only based on the mass recruitment which is highly competitive in university, or move speculatively to capital city and find a job. A sad, scary and hopeless story that closed the episode of computer science education. As the speech in one of graduation that I remember is that “The happiness of being university graduate is only one day, during the graduation, after that there is a series of stressful day that hunt us, wish us luck”. A mantra that seems true and agreed by most of student at that time. With the post financial crisis of 1998, the crazy fluctuation of US Dollar, and the lonely soul of the young man.

But what is “10 years after that” reality? well no one remember what happen in the past. Now, I can guarantee, that majority of state university graduate is live happily on the living style that beyond average Indonesian, regardless of what the GPA at that time and focus of specialization. Sometimes if you ask me, whether the painful process in the past is useless or not? or do we need to give the student a skill to predict the future? Well the answer is simple, we never know what will be happen in the future, and we will never be able to change something in the past, even we are now, become very smart.

But does it really useless at all? at least in my case, here is the connection.

Information
1. Data is something
2. Information is organization of something, is something that has meaning
3. Education is process of learning information
4. Degree is completion of education, measured by marking

Knowledge
1. Knowledge is explanation of something
2. Science is organization of knowledge
3. Research is process to get new knowledge, with hypothesis (proposed explanation) and procedure
4. Theory is a completion of research, tool for explanation, measured by peer-review

1169 words

Written by Anjar Priandoyo

Januari 6, 2016 at 11:19 am

Ditulis dalam Life

Tagged with