![]() ![]() Thread synchronization model needs to be totally redone, as currently used spinlocks aren't scalable.Each entity's properties are retrieved with a separate uncacheable HTTP request to Special:EntityData which isn't very fast so as the rate of changes increases, WDQ will bump into it hard, not being able to cope with updates.And this update routine results in a possible race condition making it miss some changes.You can't run in production the DB query used to retrieve latest changes is, so this part will have to be redone completely.for Redis that we keep running for months, the less mature nature of WDQ and the need to cater for development/future bugs will mean a lot of restarts, each being a PITA. Server startup time is not nice either, and will only grow with the growth of the dataset.Even if we optimize it by an order of magnitude, it will still be slow. The initial dump conversion is extremely slow, so if a server process crashes and its dump gets corrupt you face a prolonged outage.Relatively expressive, custom query language.Custom in-memory graph database implemented in C++.TinkerPop blueprints support, including Gremlin and the GraphSail RDF interface.Can gradually convert complex queries into simple(r) ones by propagating information on the graph & adding indexes.Supports relatively rich indexing, including complex indexes using ElasticSearch.async multi-cluster replication can be used for isolation of research clusters, DC fail-over.Implemented as a thin stateless layer on top of Cassandra or HBase: transparent sharding, replication and fail-over.Expressive query language ( Gremlin) shared with other graph dbs like Neo4j. ![]() Supports online modification (OLTP), so can reflect current state.robust: automatic handling of node failures, cross-datacenter replication, proven in productionĬandidate solutions Titan.handle high request volumes (horizontal scaling).Seconds or even a minute or two lag seems acceptable at this point but nothing beyond that.needs to support continuous updates to reflect latest Wikidata state.these need to not crash external requests and external cannot crush internal.internal requests are allowed to use more resources & time.how to enforce that constraint needs to be determined and influences the architecture.external requests return within a few seconds, use reasonable resources.Phase 2: Support for public/external requests regenerate as often as practical/possible.pre-generate results and store in a table or cache, so these queries can run longer (but still within some reasonable timeframe, e.g.Complex queries: (e.g., list of all possible suggested occupations).generate live and run quickly – as fast as possible – to serve immediately to users via WikiGrok (and potentially continue serving more results on the fly after user input). ![]()
0 Comments
Leave a Reply. |