Jazoon Day 2
Jazoon is an annual convention for java and software development in Zurich . I’m having the opportunity to attend it and I decided to blog about the experience.
Because of the overwhelming material I’m going to be writing only short notes and buzzwords (brain dumps). The intent is to mediate the buzzwords of today’s software development market.
Here is a brain dump of Day-2.
Scalability, fault tolerant and concurrency systems.
Twitter @jboner
Tradeoffs
- Performance vs scalability. Don’t confuse them together.
- Latency vs throughput.
- Aviability vs consistency
Concurrency
- Multicore on smaller devices.
- Shared mutable state and Threads. Leads to Indeterministic code.
- Locks are not a silver bullet for aolving concurrency problems.
- More tools. Dataflow concurrency. Actors. STM software transformation model.
Fault Tolerant
- Use Acknowledgement mechanisms
Distributed
- Bad in many aspects. Remote objects. Rpc.
- Use Asynchronous Message Passing.
- CALM Conjecture.
- BLOOM
Sessions:
Cassandra with bigdata
- Nosql. Not only sql.
- Origions of cassandra. Google- bigtable and amazon-dynamo.
- Example Implementationsé
- Facebook inbox system
- Twitter realtime analytics.
- Netflex. 500 nodes
- Scandit
Main motive for using cassandra:
- Partitioning.
- Simplicitly. No master or slave nodes.
- Performs well in write-heavy environment.
- Proven scalability
- Tunable replication
- Datamodel
When using Cassandra, You have to give up the following features:
- Joins
- Transactions
- Referential integrity
- Expressive query lang
- Strong consistency
Cassandra data model
- Column familes. Table
- Rows
- Columns
- Supercolumns. Depricated.
Go Big Data
- Doean’t mix with OO. Hadoop.
- More twords functional approaches
- Crunch and Scrunch cascade cascalog scalding
- We need more real-time data processing
- Spark. Storm .gridGain. Akka
- Rdbms scaling reads and writes.
- ACID. Do we need that all the time ?
- Don’t put all of your data in RDBMS.
- Brewer’s CAP (Consistency Aviability Partitioning) Theorem.
- You can get only 2 of them not all three.
- Rdms CA.
EVENT Sourcing
- Every state changed ia materialized in an Event.
- All events are aent to EventProcessor
- Event processor stores all events in an EventLog.
- Snapshots could be saved every 1000 events.
- System reset is done by replaing the events from a snapshot
- You can also use EventListener.
Cassandra with bigdtata
- Cassandra is scalable
- Schema and data types
- Schema is optional
- Cluster organization
- Nodes in a ring each node is responsible to certain range token.
Storing a Row:
- Md5 hASH BASED ON ROW KEY
- Determine range of hash
- Implicatibs
- Md5 hashing balances load
- No hotspots
- You can’t make range queries, instead you have to use one of two options:
- Option 1,OPP order preserving partitioner
- Option 2, use co;umns instead of row
Replication Factor (RF)
- Rows are automaticall stored in RF and RF-1 NODES if RF is 2.
TUNABLE REPLICATON STRATEGY
- clients access any node.
- Coordinator forwards request
Consistency Levels CL
- ONE
- ALL
- QUORUM
- LOCAL_QUORUM
Handling inconsistent data:
- Timestamps for columns supplied by client
- The latst time stamp always wins,
- Read repair
- Hinted hand off
- Anti entropy node repair
Java in multicore world:
- Concurrent threads
- Concurrency not parallelism
- Multithreading not automatically mean parallalism
- Virtyalisatin approach
- Message passing concurrency
- Software transactional memory
- Fine grained parallelism. Sorting ,aggregation
- Map-reduce
- Amdahl’s law
- Divide and conquer
- Fork-join framework
- Jsr 166
- Jdk 7
- Jsr166y package for java 6
- Extends RecursiveAction
- Override Compute method