江湖術士 / John Wu 述事

發表文章

目前顯示的是 3月, 2021的文章

MongoDB - The first sight

3月 06, 2021

Introduction MongoDB is a document database designed for 易於開發與擴展，資料結構是field and value pairs 的BSON documents (像是JSON object). Value 型態除了string, int可為 document, arrays of document... MongoDB與DynamoDB都能運用GoeJOSN格式儲存座標地理位置 ref: https://geojson.org / https://docs.mongodb.com/manual/geospatial-queries / RDB v.s MongoDB 術語 database -> database table -> collection row -> document column -> field MongoDB 對應回RDB的專有名詞（aggregation mapping from SQL terms）： WHERE -> match GROUP BY -> group HAVING -> match SELECT -> project ORDER BY -> sort LIMIT -> limit SUM() -> sum COUNT -> sum/sortByCount join -> lookup MongoDB的index 如同NoSQL的設定，沒透過index搜尋則會full collection scan. Index 是個能夠易於traverse 的資料結構。儲存對某field的value by 排序狀態，讓index搜尋時有效的匹配範圍操作的查詢 https://docs.mongodb.com/manual/indexes / MongoDB index type Single-field - 除了_id 之外可以自定義其他index，因為欄位單一因此順序不重要。e.g.: {score: 1} Compound Index - 定義index於多欄位，架設index為以下兩個組成 {name:1, score: -1}，則index會先以name做排序再排score。 Multike...

閱讀完整內容

Redis - the first sight

3月 06, 2021

Introduction Redis a.k.a Re mote Di ctionary S erver Open source , in-memory data structure store 資料儲存方式，用在 used as database, cache and message broker. 提供strings, hashes, lists, sets, sorted sets with range queries, bitmaps, geospatial indexes, and streams 等資料型態. Redis 內建 replica, Lua scripting, LRU eviction, transactions 以及不同level的on-disk persistence並且透過Redis Sentinel提供高可用性以及用Redis Cluster 達到automatic partitioning. Redis 可以做 atomic operations 像是 append string to value, 對hash value增加（或減少），push elements to list, 取交集、聯集...等。Redis 可以藉由dump dataset to disk 或appended commands to a disk-based log來保存資料。若只為了做cached則不必則不必開啟persistence. 科普 What is "atomic operations"? In concurrent programing - program runs completely independently of any other processes. In operating system kernel - Most computer hardware, compilers, libraries are provide different levels of atomic operations. 處理器在同一個記憶體位置讀寫做data transmission，可用atomic operation 來確保資料的來確保資料的正確性。 AWS 的ElastiCache 提供Redis and Memcached可...

閱讀完整內容

NoSQL and AWS DynamoDB - The first sight

3月 06, 2021

Introduction NoSQL 是 Not Only SQL，是一種非傳統關連式的資料庫由於是key-value 的架構，因此設計primary key格外重要。primary key由partition key 與sort key(optional)結合。在搜尋的方式上，因為架構上受限，只能對key 做運算(=, >=, /= ...)的判斷，因此當你決定用NoSQL來作為資料庫時，第一個問題是要用那個欄位為key? 根據AWS的解釋 NoSQL是高效能、非關連式資料庫並且有彈性的資料模組，部署容易，易於scale out。 Query 時partition key 不得為空 Types of NoSQL Databases key-value, 用特定鍵值去搜尋資料，像是Redis這樣的操作。 Column-oriented, 用來應付分散式巨量資料的儲存方式，仍是用key去搜尋，會對應多個列，如Cassandra Document, 半結構化的儲存格式像是JSON format, 查詢效率更高。like MongoDB Graph, 圖形結構 like Neo4j In-memory, use case like leaderboard, session stores, 即便redis可以提供低延遲的workload, 但畢竟沒有寫到disk like DynamoDB Search, 若把elasticsearch算進來，就會是屬於對搜尋最佳化後的NoSQL例子 AWS DynamoDB AWS DynamoDB 的Primary Key有兩種 Particition Key- 為單一attribute表示，值不允許重複，經過hash function存入對應的partition(物理） Partition key and sort key- 兩個attribute組合而成，可允許有相同partition key但同時sort key必須唯一。經過hash function 後相同partition key就會在同一區，再由short key做排序。查詢時...primary key 值是必要判斷條件，不可能有空值。若要以其他attribute作為查詢條件，可建立Secondary index。 Secondary indexes...

閱讀完整內容

Java Hashtable, HashMap - The first sight

3月 06, 2021

Implement LRU cache in JAVA Hashtable 是實作map的方式之一，基本操作就是put和get某個value by key from map table. HashMap實例中有兩個影響效能的參數，initial capacity和load factor。簡單的說，設定capacity創始大小若使用率高於臨界值，則會擴大一倍capacity for接下來新的key put，擴大一倍這動作稱為Table doubling。 Threshold = initial capacity * load factor 因為擴大會影響效能，因此最佳化前可以先稍微計算大多落在多少個entry，並加些緩衝空間。 LinkedHashMap 是用linked list 跟hash table去實作map, 具有迭代的順序性。跟上回提到的hashmap不同點就是有雙向連結所有entry, 並且會有順序性，從最少到最常被access的順序。因此才被常來實作LRUCache (in Java), python則可以借用OrderedDict來實作。 Hashtable 繼承Dictionary, sychronize, key and value不運許空值 HashMap 繼承AbstractMap, default non-synchronize 都實現Map interface Appendix https://stackoverflow.com/questions/224868/easy-simple-to-use-lru-cache-in-java http://www.ipshop.xyz/5103.html

閱讀完整內容

MySQL - The first sight

3月 06, 2021

MySQL Index 眾所皆知的MYSQL INNODB是採用B-TREE的資料結構。參考@GENCHILI的文章後手畫了概念。 NODE裡面存POINTER指向CHILD NODE，直到LEAF NODE 才取得DATA. 仔細看LEAF NODE也有連結指向鄰近的NODE，加速對範圍間讀取資料的速度。基本上secondary 就是基於cluster index再多長出的leaf node去存資訊。反向索引也有應用在RDB上，像是MYSQL INNODB FULL TEXT INDEX的設計。基本上也都是加快全文檢索的實作方式。 Appendix Innodb index types https://dev.mysql.com/doc/refman/8.0/en/innodb-index-types.html Innodb fulltext index https://dev.mysql.com/doc/refman/5.6/en/innodb-fulltext-index.html 淺談 InnoDB 的 Cluster Index 和 Secondary Index https://medium.com/@genchilu/%E6%B7%BA%E8%AB%87-innodb-%E7%9A%84-cluster-index-%E5%92%8C-secondary-index-f75da308352e

閱讀完整內容

Elasticsearch - The first sight

3月 03, 2021

Introduction Elasticsearch這service based on Apache Lucene, 用 Java語言開發，主要功能是作search 和 log analysis。屬於 NoSQL database，資料是可以unstructured的，無法用SQL語法去query它。基本觀念可從下方連結看slides 學ES https://www.slideshare.net/rueian3/elasticsearch-45855699 What is index? shard? index 可以是裡面的類database，也是儲存文件的這個動作，而反向索引 inverted index 是適合全文檢索的索引結構。 ES的index是由好幾個shard組成，負責文件的儲存以及索引。對文字做分詞(tokenization)和正規化(normalization)過程叫做analysis。 Analysis 由analyzer做Character filters 像是轉換符號變文字、去除多餘文字, Tokenizer將字串切成有意義的單字, Token filter 做單字處理，大小寫統一、同義字轉換。 So.... what is invert index?! 記得上方投影片這個例子(slides p.34)： Elasticsearch 用invert index建立index提供full text search。以投影片中的兩個document為例： The quick brown fox jumped over the lazy dog Quick brown foxes leap over lazy dogs in summer 分析下來如下建⽴立出來的 inverted index 看起來⼤大概像是左邊的表。 Term Doc_1 Doc_2 ------------------------- Quick | | X The | X | brown | X | X dog | X | dogs ...

閱讀完整內容