Working Notes (Part C) ---------------------- This file contains a diary of random working notes, which I use to keep track of what the heck it is that I'm doing. It is almost surely totally useless to you, except maybe for some weird voyeuristic reasons. ====================================================================== Feb 2021 -------- Commisioning run for calibration (fake language) Crash. ulimit -c unlimited ulimit -a (cog-rocks-stats) Connected to rocks:///home/ubuntu/data/fake_pairs.rdb Database contents: Next aid: 633 Atoms/Links/Nodes a@: 633 l@: 586 n@: 46 Keys/Incoming/Hash k@: 338 i@: 336 h@: 0 Thread 172410 "guile" received signal SIGABRT, Aborted. [Switching to Thread 0x7fff57fff700 (LWP 14341)] __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007ffff7cbe859 in __GI_abort () at abort.c:79 #2 0x00007ffff7cbe729 in __assert_fail_base ( fmt=0x7ffff7e54588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x7ffff787262d "mutex->__data.__owner == 0", file=0x7ffff78725fa "../nptl/pthread_mutex_lock.c", line=117, function=) at assert.c:92 #3 0x00007ffff7ccff36 in __GI___assert_fail ( assertion=assertion@entry=0x7ffff787262d "mutex->__data.__owner == 0", file=file@entry=0x7ffff78725fa "../nptl/pthread_mutex_lock.c", line=line@entry=117, function=function@entry=0x7ffff7872790 <__PRETTY_FUNCTION__.10174> "__pthread_mutex_lock") at assert.c:101 #4 0x00007ffff78661a9 in __GI___pthread_mutex_lock (mutex=) at ../nptl/pthread_mutex_lock.c:117 #5 0x00007fffececc047 in __gthread_mutex_lock (__mutex=0x5555559b1560) at /usr/include/x86_64-linux-gnu/c++/9/bits/gthr-default.h:749 #6 __gthread_recursive_mutex_lock (__mutex=0x5555559b1560) at /usr/include/x86_64-linux-gnu/c++/9/bits/gthr-default.h:811 #7 std::recursive_mutex::lock (this=0x5555559b1560) at /usr/include/c++/9/mutex:106 #8 std::unique_lock::lock (this=, this=) at /usr/include/c++/9/bits/unique_lock.h:141 #9 std::unique_lock::unique_lock (__m=..., this=) at /usr/include/c++/9/bits/unique_lock.h:71 #10 opencog::AtomTable::add (this=this@entry=0x5555559b1560, orig=..., force=force@entry=false) at /home/ubuntu/src/atomspace/opencog/atomspace/AtomTable.cc:216 #11 0x00007fffecec1b85 in opencog::AtomSpace::add_node ( this=this@entry=0x5555559b1560, t=, t@entry=240, name=...) at /home/ubuntu/src/atomspace/opencog/atomspace/AtomSpace.cc:287 #12 0x00007fffeceff99b in opencog::SchemeSmob::ss_new_node ( stype=, sname=, kv_pairs=0x304) at /home/ubuntu/src/atomspace/opencog/guile/SchemeSmobNew.cc:390 (gdb) print in $4 = "(observe-text \" n j d s b o u g\")\n" scheme@(guile-user)> ==7309== Thread 57: ==7309== Syscall param futex(futex) points to unaddressable byte(s) ==7309== at 0x4FA2839: __pthread_mutex_unlock_usercnt (pthread_mutex_unlock.c:58) ==7309== by 0x4FA2839: pthread_mutex_unlock (pthread_mutex_unlock.c:357) ==7309== by 0x10FAC951: rocksdb::WriteThread::ExitAsBatchGroupFollower(rocksdb::WriteThread::Writer*) (in /usr/lib/librocksdb.so.5.17.2) ==7309== by 0x10EFCCAC: rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*) (in /usr/lib/librocksdb.so.5.17.2) ==7309== by 0x10EFE9AF: rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*) (in /usr/lib/librocksdb.so.5.17.2) ==7309== by 0x10EFEC1B: rocksdb::DB::Delete(rocksdb::WriteOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&) (in /usr/lib/librocksdb.so.5.1 7.2) ==7309== by 0x10EFEC7F: rocksdb::DBImpl::Delete(rocksdb::WriteOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&) (in /usr/lib/librocksdb.so.5.17.2) ==7309== by 0x10E65771: rocksdb::DB::Delete(rocksdb::WriteOptions const&, rocksdb::Slice const&) (in /usr/lib/librocksdb.so.5.17.2) ==7309== by 0x10BC6D97: opencog::RocksStorage::storeAtom(opencog::Handle const&, bool) (RocksIO.cc:254) ==7309== by 0x10B5F908: opencog::PersistSCM::dflt_store_atom(opencog::Handle) (PersistSCM.cc:289) ==7309== by 0x10B63459: conv_call_method<0> (SchemePrimitive.h:253) ==7309== by 0x10B63459: cpp_invoke (SchemePrimitive.h:261) ==7309== by 0x10B63459: opencog::SchemePrimitive::invoke(scm_unused_struct*) (SchemePrimitive.h:399) ==7309== by 0x1011B8CD: opencog::PrimitiveEnviron::do_call(scm_unused_struct*, scm_unused_struct*) (SchemePrimitive.cc:172) ==7309== by 0x2464E105: ??? ==7309== Address 0x29042bc0 is on thread 62's stack ==7309== 1384 bytes below stack pointer ==7309== and again .. just like above ... ==7309== Warning: unimplemented fcntl command: 1036 ==7309== Thread 59: ==7309== Syscall param futex(futex) points to unaddressable byte(s) ==7309== at 0x4FA2839: __pthread_mutex_unlock_usercnt (pthread_mutex_unlock.c:58) ... ==7309== Address 0x45bd3bc0 is on thread 48's stack ==7309== 1384 bytes below stack pointer ==7309== ==7309== Warning: unimplemented fcntl command: 1036 ==7309== Warning: unimplemented fcntl command: 1036 identical stack trace, different thread. maybe put a lock into storeAtom circa line 254??? gdb: almost identical stack trace, except add_link: #8 std::unique_lock::lock (this=, this=) at /usr/include/c++/9/bits/unique_lock.h:141 #9 std::unique_lock::unique_lock (__m=..., this=) at /usr/include/c++/9/bits/unique_lock.h:71 #10 opencog::AtomTable::add (this=this@entry=0x55d164c980c0, orig=..., force=force@entry=false) at /home/ubuntu/src/atomspace/opencog/atomspace/AtomTable.cc:216 #11 0x00007f053765ac75 in opencog::AtomSpace::add_link ( this=this@entry=0x55d164c980c0, t=, t@entry=137, outgoing=...) at /home/ubuntu/src/atomspace/opencog/atomspace/AtomSpace.cc:304 #12 0x00007f0537697cf3 in opencog::SchemeSmob::ss_new_link ( stype=, satom_list=0x55d1682d6f70) at /home/ubuntu/src/atomspace/opencog/guile/SchemeSmobNew.cc:508 * 1 Thread 0x7f051a7fc700 (LWP 26727) __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 2 Thread 0x7f05412c5740 (LWP 8955) __GI___libc_read (nbytes=1, buf=0x55d1647fcad0, fd=0) at ../sysdeps/unix/sysv/linux/read.c:26 scm_readline 3 Thread 0x7f05406f7700 (LWP 8956) futex_wait_cancelable ( private=, expected=0, futex_word=0x7f0541099dec) at ../sysdeps/nptl/futex-internal.h:183 GC_wait_marker ... also 4 5 7 13 14 15 23 24 6 is ConsoleSocket 8 is rocksdb::ThreadPoolImpl::Impl::BGThread 16 17 19 20 22 25 26 9 is opencog::Logger::LogWriter::writing_loop 10 Thread 0x7f04a4ff9700 (LWP 26729) __pthread_clockjoin_ex ( threadid=139657933203200, thread_return=0x0, clockid=, abstime=, block=) 10 is opencog::GenericShell::~GenericShell also 11 also 39 12 is ~ConsoleSocket 18 Thread 0x7f04a67fc700 (LWP 9013) __GI___libc_read (nbytes=1, buf=0x7f04a67fb5c0, fd=22) at ../sysdeps/unix/sysv/linux/read.c:26 GC_do_blocking_inner also 43 21 is immortal_thread 27 is opencog::SchemeEval::do_eval 31 Thread 0x7f04a6ffd700 (LWP 10064) __libc_recvmsg (flags=0, msg=0x7f04a6ffcbe0, fd=16) at ../sysdeps/unix/sysv/linux/recvmsg.c:28 31 is boost::asio::detail::socket_ops::recv ServerSocket.cc:143 35 Thread 0x7f0532fae700 (LWP 8976) __lll_lock_wait ( futex=futex@entry=0x55d16514e698, private=0) at lowlevellock.c:52 35 is opencog::CogServer::runLoopStep 45 is shutdown .. asio opencog::ServerConsole::Exit 52 is add_link #0 __lll_lock_wait (futex=futex@entry=0x55d164c980c0, private=0) at lowlevellock.c:52 #1 0x00007f0540c7c131 in __GI___pthread_mutex_lock (mutex=0x55d164c980c0) at ../nptl/pthread_mutex_lock.c:115 #2 0x00007f0537665047 in __gthread_mutex_lock (__mutex=0x55d164c980c0) at /usr/include/x86_64-linux-gnu/c++/9/bits/gthr-default.h:749 #3 __gthread_recursive_mutex_lock (__mutex=0x55d164c980c0) at /usr/include/x86_64-linux-gnu/c++/9/bits/gthr-default.h:811 #4 std::recursive_mutex::lock (this=0x55d164c980c0) at /usr/include/c++/9/mutex:106 #5 std::unique_lock::lock (this=, this=) at /usr/include/c++/9/bits/unique_lock.h:141 #6 std::unique_lock::unique_lock (__m=..., this=) at /usr/include/c++/9/bits/unique_lock.h:71 #7 opencog::AtomTable::add (this=this@entry=0x55d164c980c0, orig=..., force=force@entry=false) at /home/ubuntu/src/atomspace/opencog/atomspace/AtomTable.cc:216 #8 0x00007f053765ac75 in opencog::AtomSpace::add_link ( this=this@entry=0x55d164c980c0, t=, t@entry=137, outgoing=...) at /home/ubuntu/src/atomspace/opencog/atomspace/AtomSpace.cc:304 #9 0x00007f0537697cf3 in opencog::SchemeSmob::ss_new_link ( stype=, satom_list=0x55d1682ac410) at /home/ubuntu/src/atomspace/opencog/guile/SchemeSmobNew.cc:508 stack #6 lock is print __m $1 = (std::unique_lock::mutex_type &) @0x55d164c980c0: { = {_M_mutex = {__data = {__lock = 2, __count = 0, __owner = -955664080, __nusers = 32514, __kind = 1, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\060\271\t\307\002\177\000\000\001", '\000' , __align = 2}}, } stack 7 print _mtx $2 = { = {_M_mutex = {__data = {__lock = 2, __count = 0, __owner = -955664080, __nusers = 32514, __kind = 1, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\060\271\t\307\002\177\000\000\001", '\000' , __align = 2}}, } (gdb) print lck $3 = {_M_device = 0x55d164c980c0, _M_owns = } thr 1 stack #10 Its the same lock w/ same lock corruption.` Happened again... Above is with stock rocks on ubuntu 20.04 LTS focal sudo apt purge librocksdb-dev librocksdb5.17 (october 2018) git clone https://github.com/facebook/rocksdb make shared_lib guile: symbol lookup error: /usr/local/lib/opencog/libpersist-rocks.so: undefined symbol: ZSTD_versionNumber or .. libasan (cog-get-all-roots) (fetch-all-words) (load-atoms-of-type 'WordNode) after pair-mi: $ du -s ~/data/expt-1 4820 /home/ubuntu/data/expt-1 (cog-rocks-stats) cog-rocks-stats: Atomspace holds 731 atoms Next aid: 734 Atoms/Links/Nodes a@: 733 l@: 674 n@: 55 Keys/Incoming/Hash k@: 1122 i@: 379 h@: 0 MST counting: crash: terminate called after throwing an instance of 'opencog::RuntimeException' what(): AtomTable - deleteing atomtable 1 which has subtables! (/home/ubuntu/src/atomspace/opencog/atomspace/AtomTable.cc:97) Aborted Core was generated by `guile -l mst-count-fake.scm'. Program terminated with signal SIGABRT, Aborted. #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. [Current thread is 1 (Thread 0x7f6ce97ea700 (LWP 31842))] #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007f6d0d415859 in __GI_abort () at abort.c:79 #2 0x00007f6d0282e951 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6 #3 0x00007f6d0283a47c in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6 #4 0x00007f6d02839459 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6 #5 0x00007f6d02839e11 in __gxx_personality_v0 () from /lib/x86_64-linux-gnu/libstdc++.so.6 #6 0x00007f6d02738bdf in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1 #7 0x00007f6d02739271 in _Unwind_RaiseException () from /lib/x86_64-linux-gnu/libgcc_s.so.1 #8 0x00007f6d0283a78c in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6 #9 0x00007f6d02d09fef in opencog::AtomTable::~AtomTable (this=0x563608e307a0, __in_chrg=) at /home/ubuntu/src/atomspace/opencog/atomspace/AtomTable.cc:97 #10 0x00007f6d02d443dc in opencog::SchemeSmob::release_as (as=) at /home/ubuntu/src/atomspace/opencog/guile/SchemeSmobAS.cc:127 #11 0x00007f6d02d44f03 in opencog::SchemeSmob::free_misc (node=) at /home/ubuntu/src/atomspace/opencog/guile/SchemeSmobGC.cc:61 #12 0x00007f6d0d38bdff in GC_invoke_finalizers () from /lib/x86_64-linux-gnu/libgc.so.1 #13 0x00007f6d0d69e968 in ?? () from /lib/x86_64-linux-gnu/libguile-3.0.so.1 __in_chrg=) at /home/ubuntu/src/atomspace/opencog/atomspace/AtomTable.cc:97 #10 0x00007f6d02d443dc in opencog::SchemeSmob::release_as (as=) at /home/ubuntu/src/atomspace/opencog/guile/SchemeSmobAS.cc:127 #11 0x00007f6d02d44f03 in opencog::SchemeSmob::free_misc (node=) at /home/ubuntu/src/atomspace/opencog/guile/SchemeSmobGC.cc:61 Bad!!! #12 0x00007f6d0d38bdff in GC_invoke_finalizers () from /lib/x86_64-linux-gnu/libgc.so.1 fixed in issue https://github.com/opencog/atomspace/pull/2796 after pair-mi: $ du -s ~/data/expt-1 4820 /home/ubuntu/data/expt-1 (cog-rocks-stats) cog-rocks-stats: Atomspace holds 731 atoms Next aid: 734 Atoms/Links/Nodes a@: 733 l@: 674 n@: 55 Keys/Incoming/Hash k@: 1122 i@: 379 h@: 0 after mst: $ du -s ~/data/expt-1 178256 /home/ubuntu/data/expt-1 20420 /home/ubuntu/data/expt-1 after auto-compaction Next aid: 39919 Atoms/Links/Nodes a@: 39918 l@: 39845 n@: 57 Keys/Incoming/Hash k@: 22173 i@: 18589 h@: 0 Error: support-api: There isn't any cached data on cset Run `((add-support-compute LLOBJ) 'cache-all)` to compute that data Error: support-api: There isn't any cached data on cset Run `((add-support-compute LLOBJ) 'cache-all)` to compute that data RocksIO.cc:563 remFromSidList removeSatom (is locked...) remIncoming (is locked... remIncoming (use-modules (opencog) (opencog persist) (opencog persist-rocks)) (cog-rocks-open "rocks:///tmp/foo") (define a (Concept "a")) (define b (Concept "b")) (define l (List a b a b)) (store-atomspace) (cog-delete! l) RocksIO.cc:306 Speed is about 16 sentences/second for pair-counting. 135 sentences/sec for MPG parsing repl: guile -l pair-count-fake.scm -- a b c c-start-cogserver (print-matrix-summary-report star-obj) ((add-support-compute star-obj) 'cache-all) ((make-central-compute star-obj) 'cache-all) Spinning disks: iostat -d 5 sudo iotop -d 5 shows about 15MB/sec to 40MB/sec of disk-writes to spinning disk (dm-0 aka md15 aka sdb/sde) speed: Stored 200000 of 394854 left-wilds in 287 secs (139 pairs/sec) Stored 240000 of 394854 left-wilds in 353 secs (113 pairs/sec) Stored 280000 of 394854 left-wilds in 413 secs (97 pairs/sec) Stored 320000 of 394854 left-wilds in 379 secs (106 pairs/sec) Nasty, old postgres was 10x faster in SSD. Done storing 394854 left-wilds in 2912 secs // Never mind, above is due to a stupid debug print... as_ref_count: oas Uh no, after removing print its still slow. Stored 200000 of 394854 left-wilds in 213 secs (188 pairs/sec) Stored 240000 of 394854 left-wilds in 239 secs (167 pairs/sec) Stored 280000 of 394854 left-wilds in 286 secs (140 pairs/sec) Stored 320000 of 394854 left-wilds in 347 secs (115 pairs/sec) cog-rocks-stats: Atomspace holds 1265536 atoms Connected to rocks:///home/ubuntu/data/expt-3/mpg_parse.rdb Database contents: Next aid: 1265559 Atoms/Links/Nodes a@: 1265558 l@: 1265494 n@: 37 Keys/Incoming/Hash k@: 870932 i@: 790044 h@: 0 du -s /home/ubuntu/data/expt-3/* 1140496 /home/ubuntu/data/expt-3/mpg_parse.rdb re-open and close: 566020 /home/ubuntu/data/expt-3/mpg_parse.rdb gram-1.rdb is (gram-classify-greedy-discrim 0.5 4) gram-2.rdb is (gram-classify-greedy-fuzz 0.65 0.3 4) a b: ; c: ; d e: ; f g: ; h i j: ; k l m n: ; o: ; p: ; q r s: ; t u: ; Throw to key `C++-EXCEPTION' with args `("dflt-delete" "Internal Error! (/home/ubuntu/src/atomspace-rocks/opencog/persist/rocks/RocksIO.cc:563)\nFunction args:\n((Section (ctv 1 0 0)\n (WordNode \"s\" (ctv 1 0 1.19871e+07))\n (ConnectorSeq\n (Connector\n (WordNode \"###LEFT-WALL###\" (ctv 1 0 1.24572e+07))\n (ConnectorDir \"-\"))\n (Connector\n (WordNode \"o\" (ctv 1 0 2.3094e+06))\n (ConnectorDir \"-\"))\n (Connector\n (WordNode \"p\" (ctv 1 0 1.32314e+07))\n (ConnectorDir \"-\"))\n (Connector\n (WordNode \"u\" (ctv 1 0 1.05214e+07))\n (ConnectorDir \"-\"))))\n)")'. Can't find the sid in the sid-list... RocksStorage::remIncoming RocksStorage::removeSatom( RocksStorage::removeAtom Can't find sid=KI4< in sidlist=JqF BTG irD1 nFX2 NoU5 < klist is=i@XI4:Section< rei osatom= (this should be XI4 and since its in KI4, KI4 should be in it's oset. (ConnectorSeq (Connector (WordNode "###LEFT-WALL###")(ConnectorDir "-")) (Connector (WordNode "o")(ConnectorDir "-")) (Connector (WordNode "p")(ConnectorDir "-")) (Connector (WordNode "u")(ConnectorDir "-"))) rein satom= (this is sid KI4) we're trying to delete this (Section (WordNode "s") (ConnectorSeq (Connector (WordNode "###LEFT-WALL###")(ConnectorDir "-")) (Connector (WordNode "o")(ConnectorDir "-")) (Connector (WordNode "p")(ConnectorDir "-")) (Connector (WordNode "u")(ConnectorDir "-")))) sid should be the section... KI4 ConSeq should be XI4 klist is the osid of the osatom std::string sidlist; rocksdb::Status s = _rfile->Get(rocksdb::ReadOptions(), klist, &sidlist); export ROCKS_DB_URL=rocks:///home/ubuntu/data/expt-3/gram-2.rdb Its not there in gram-1.db either Its not there in mpg_parse.rdb ... print_all BlockBasedTableOptions table_options; table_options.filter_policy.reset(NewBloomFilterPolicy(10, false)); table_options.optimize_filters_for_memory = true; auto table_factory = new BlockBasedTableFactory(table_options); rocksdb::Options options; options.table_factory.reset(rocksdb::NewBlockBasedTableFactory(table_options)); rocksdb::DB* db; rocksdb::DB::Open(options, name, &db); table.h 1/11 Test #1: BasicSaveUTest ................... Passed 7.03 sec 2/11 Test #2: ValueSaveUTest ................... Passed 2.62 sec 3/11 Test #3: PersistUTest ..................... Passed 0.78 sec 4/11 Test #4: FetchUTest ....................... Passed 3.21 sec 5/11 Test #5: BasicDeleteUTest ................. Passed 3.39 sec 6/11 Test #6: DeleteUTest ...................... Passed 0.60 sec 7/11 Test #7: AlphaEquivUTest .................. Passed 2.93 sec 8/11 Test #8: MultiPersistUTest ................ Passed 1.53 sec 9/11 Test #9: QueryPersistUTest ................ Passed 2.09 sec 10/11 Test #10: LargeFlatUTest ................... Passed 76.56 sec 11/11 Test #11: LargeZipfUTest ................... Passed 167.55 sec Total Test time (real) = 268.36 sec Total Test time (real) = 254.93 sec 1/11 Test #1: BasicSaveUTest ................... Passed 6.60 sec 2/11 Test #2: ValueSaveUTest ................... Passed 2.13 sec 3/11 Test #3: PersistUTest ..................... Passed 0.84 sec 4/11 Test #4: FetchUTest ....................... Passed 2.61 sec 5/11 Test #5: BasicDeleteUTest ................. Passed 2.55 sec 6/11 Test #6: DeleteUTest ...................... Passed 0.57 sec 7/11 Test #7: AlphaEquivUTest .................. Passed 2.81 sec 8/11 Test #8: MultiPersistUTest ................ Passed 0.72 sec 9/11 Test #9: QueryPersistUTest ................ Passed 2.53 sec 10/11 Test #10: LargeFlatUTest ................... Passed 75.35 sec 11/11 Test #11: LargeZipfUTest ................... Passed 172.17 sec Total Test time (real) = 268.90 sec Total Test time (real) = 271.28 sec (use-modules (opencog)) (use-modules (opencog persist)) (use-modules (opencog persist-rocks)) (define a (Concept "a")) (define b (Concept "b")) (define l1 (List a b)) (define l2 (List l1 a)) (cog-rocks-open "rocks:///tmp/foo") (store-atom l2) (cog-delete-recursive! a) Boom. $ du -s *rdb 7016 fake_pairs.rdb 21710584 gram-2.rdb <<<<<<<<<<< wtf 417488 mpg_parse-no-margs.rdb 954720 mpg_parse.rdb closing and reopening does this: 538360 gram-2.rdb ---------Bingo! Dist=0.7606 for class "j n" -- "t" Can't find sid=sQ23< in sidlist=xQ23 pNu5 < inset key=i@0R23:Section< remin osatom= (ConnectorSeq (Connector (WordNode "###LEFT-WALL###")(ConnectorDir "-")) (Connector (WordNode "j")(ConnectorDir "-")) (Connector (WordNode "g")(ConnectorDir "-")) (Connector (WordNode "t")(ConnectorDir "-")) (Connector (WordNode "s")(ConnectorDir "-")) (Connector (WordNode "s")(ConnectorDir "-"))) reminc satom=sQ23 (Section (WordNode "t") (ConnectorSeq (Connector (WordNode "###LEFT-WALL###")(ConnectorDir "-")) (Connector (WordNode "j")(ConnectorDir "-")) (Connector (WordNode "g")(ConnectorDir "-")) (Connector (WordNode "t")(ConnectorDir "-")) (Connector (WordNode "s")(ConnectorDir "-")) (Connector (WordNode "s")(ConnectorDir "-")))) well foo, delete was n=0 r=0 satom=sQ23 (Section (WordNode "t")(ConnectorSeq (Connector (WordNode "###LEFT-WALL###")(ConnectorDir "-"))(Connector (WordNode "j")(ConnectorDir "-"))(Connector (WordNode "g")(ConnectorDir "-"))(Connector (WordNode "t")(ConnectorDir "-"))(Connector (WordNode "s")(ConnectorDir "-"))(Connector (WordNode "s")(ConnectorDir "-")))) ice-9/boot-9.scm:1669:16: In procedure raise-exception: Throw to key `C++-EXCEPTION' with args `("dflt-delete" "Internal Error! (/home/ubuntu/src/atomspace-rocks/opencog/persist/rocks/RocksIO.cc:582) Function args: ((Section (ctv 1 0 0) (WordNode "t" (ctv 1 0 1.0531e+07)) (ConnectorSeq (Connector (WordNode "###LEFT-WALL###" (ctv 1 0 1.24572e+07)) (ConnectorDir "-")) (Connector (WordNode "j" (ctv 1 0 1.56322e+07)) (ConnectorDir "-")) (Connector (WordNode "g" (ctv 1 0 4.84869e+06)) (ConnectorDir "-")) (Connector (WordNode "t" (ctv 1 0 1.0531e+07)) (ConnectorDir "-")) (Connector (WordNode "s" (ctv 1 0 1.19871e+07)) (ConnectorDir "-")) (Connector (WordNode "s" (ctv 1 0 1.19871e+07)) (ConnectorDir "-")))) )")'. (cog-rocks-get "i@0R23:Section") yes its missing (cog-rocks-get "a@sQ23") this is the section. (cog-rocks-get "l@(ConnectorSeq (Connector (WordNode \"###LEFT-WALL###\")(ConnectorDir \"-\"))(Connector (WordNode \"j\")(ConnectorDir \"-\"))(Connector (WordNode \"g\")(ConnectorDir \"-\"))(Connector (WordNode \"t\")(ConnectorDir \"-\"))(Connector (WordNode \"s\")(ConnectorDir \"-\"))(Connector (WordNode \"s\")(ConnectorDir \"-\")))") yes, this is 0R23 so wtf... Huh. mpg_parse: (cog-rocks-get "i@0R23:Section") ... also missing... (cog-rocks-get "a@sQ23") is the expected secion. (cog-rocks-get "l@ returns 0R23 as expcected... so wtf. How about no-margs... also missing there... (cog-rocks-get "l@(Section (WordNode \"t\")(ConnectorSeq (Connector (WordNode \"###LEFT-WALL###\")(ConnectorDir \"-\"))(Connector (WordNode \"j\")(ConnectorDir \"-\"))(Connector (WordNode \"g\")(ConnectorDir \"-\"))(Connector (WordNode \"t\")(ConnectorDir \"-\"))(Connector (WordNode \"s\")(ConnectorDir \"-\"))(Connector (WordNode \"s\")(ConnectorDir \"-\"))))") gives sQ23 exactly as expected. ... Conclude: dj parsing fails to write the sections correctly ... why? How? WTF??? /data/expt-3/bad-again/mpg_parse-no-margs.rdb Next aid: 870677 Atoms/Links/Nodes a@: 870676 l@: 870624 n@: 34 Keys/Incoming/Hash k@: 476041 i@: 395182 h@: 0 Mising sid dR92 (ConnectorSeq (Connector (WordNode "c")(ConnectorDir "-"))(Connector (WordNode "p")(ConnectorDir "+"))(Connector (WordNode "c")(ConnectorDir "+"))(Connector (WordNode "j")(ConnectorDir "+"))(Connector (WordNode "t")(ConnectorDir "+"))) Not in incoming for DB (Connector (WordNode "c")(ConnectorDir "-")) (cog-rocks-get "l@(ConnectorSeq (Connector (WordNode \"c\")(ConnectorDir \"-\"))(Connector (WordNode \"p\")(ConnectorDir \"+\"))(Connector (WordNode \"c\")(ConnectorDir \"+\"))(Connector (WordNode \"j\")(ConnectorDir \"+\"))(Connector (WordNode \"t\")(ConnectorDir \"+\")))") dR92 OK -- fake_pairs has one bad link ... out of 480 Ahhh! Mising sid M1 (EvaluationLink (LinkGrammarRelationshipNode "ANY")(ListLink (WordNode "p")(WordNode "d"))) Not in incoming for K1 (ListLink (WordNode "p")(WordNode "d")) cog-rocks-get "i@K1:") rkey: >>i@K1:EvaluationLink<< rval: >>G1 << (cog-rocks-get "a@G1") rkey: >>a@G1<< rval: >>(EvaluationLink (LinkGrammarRelationshipNode "ANY")(ListLink (WordNode "p")(WordNode "d")))<< ... M1 and G1 are the same... Ooops! Hypoth: two threads writing, neither thread finds l@ or n@ so start a new one. 200 looks it up. 235 writes it. 17008 expt-3/bad-again/mpg_parse-no-margs.rdb has Next aid: 870677 Atoms/Links/Nodes a@: 870676 l@: 870624 n@: 34 Keys/Incoming/Hash k@: 476041 i@: 395182 h@: 0 Same M1/G1 from above and three more... total of 4 expt-3/bad-again/mpg_parse.rdb (with marginals) has ---------- Try again: stock rocksdb-5.17 from ubuntu focal mounted on ssd djgeneration is now: OLD speed: MPG-Processing file >>>corpus-7.txt<<< Sent out article in 259 seconds MPG-Processing file >>>corpus-11.txt<<< Sent out article in 648 seconds MPG-Processing file >>>corpus-12.txt<<< Sent out article in 1134 seconds MPG-Processing file >>>corpus-10.txt<<< Sent out article in 1185 seconds MPG-Processing file >>>corpus-8.txt<<< Sent out article in 916 seconds MPG-Processing file >>>corpus-9.txt<<< Sent out article in 1218 seconds real 90m16.308s NEW speed: MPG-Processing file >>>corpus-7.txt<<< Sent out article in 257 seconds MPG-Processing file >>>corpus-11.txt<<< Sent out article in 502 seconds MPG-Processing file >>>corpus-12.txt<<< Sent out article in 958 seconds MPG-Processing file >>>corpus-10.txt<<< Sent out article in 977 seconds MPG-Processing file >>>corpus-8.txt<<< Sent out article in 774 seconds MPG-Processing file >>>corpus-9.txt<<< Sent out article in 1020 seconds real 75m45.906s real 80m52.774s ----------- Restarted pair counting from scratch in expt-4 Why is this so slow!??? Splitting and word-pair counting file >>>corpus-7.txt<<< Sent out article in 1764 seconds Splitting and word-pair counting file >>>corpus-11.txt<<< Sent out article in 3788 seconds Splitting and word-pair counting file >>>corpus-12.txt<<< Sent out article in 4706 seconds 24GB or RAM at this point running at 680% cpu $ du -s * 292816 fake_pairs.rdb guile looks OK. Something is leaking RAM. Splitting and word-pair counting file >>>corpus-10.txt<<< Sent out article in 4025 seconds Splitting and word-pair counting file >>>corpus-8.txt<<< Sent out article in 3333 seconds Splitting and word-pair counting file >>>corpus-9.txt<<< Sent out article in 3654 seconds real 359m9.453s (gc-time-taken . 10549268042307) Yow!!! (heap-size . 42799104) 43MB (heap-free-size . 16207872) (heap-total-allocated . 449802881088) ; Yow! 450GB!!! what the!? (heap-allocated-since-gc . 2444768) (protected-objects . 51) (gc-times . 35781)) PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 12795 ubuntu 20 0 40.5g 37.6g 38708 S 0.3 14.9 2370:57 guile Total of 445 atoms in atomspace ... djparsing: Same as "NEW speed", above. real 78m48.721s $ du -s * 728 fake_pairs.rdb 532628 mpg_parse.rdb disjunct marginals: (define pca (make-pseudo-cset-api)) (define psa (add-pair-stars pca)) (define btr (batch-transpose psa)) (psa 'fetch-pairs) Elapsed time to load csets: 54 secs ((add-support-compute psa) 'cache-all) Finished left norm marginals in 176 secs Finished left totals in 17 secs Finished right norm marginals in 25 secs Finished right totals in 0 secs piss-ant slow as before: (btr 'mmt-marginals) Stored 240000 of 370186 left-wilds in 321 secs (125 pairs/sec) Stored 280000 of 370186 left-wilds in 366 secs (109 pairs/sec) Stored 320000 of 370186 left-wilds in 423 secs (95 pairs/sec) Done storing 370186 left-wilds in 2511 secs -------------- expt-5 is the postgres version apt install postgresql-client postgresql pgtop libpq-dev Do NOT do these opts in /etc/postgresql/12/main effective_cache_size = 60GB seq_page_cost = 0.1 random_page_cost = 0.1 checkpoint_completion_target = 0.9 effective_io_concurrency = 100 max_worker_processes = 24 Throw to key `C++-EXCEPTION' with args `("sql-create" "Failed to execute SQL command!\nPQresult message: ERROR: database \"foo\" already exists\n\nPQ query was: CREATE DATABASE foo; (/home/ubuntu/src/atomspace/opencog/persist/sql/multi-driver/ll-pg-cxx.cc:130)\nFunction args:\n(postgres:///foo)")'. Throw to key `C++-EXCEPTION' with args `("sql-create" "Cannot connect to database: FATAL: database \"foo\" does not exist\n (/home/ubuntu/src/atomspace/opencog/persist/sql/multi-driver/ll-pg-cxx.cc:54)\nFunction args:\n(postgres:///foo)")' postgres:///fake_pairs (use-modules (opencog) (opencog persist) (opencog persist-sql)) (sql-create "postgres:///fake_pairs") Splitting and word-pair counting file >>>corpus-7.txt<<< Sent out article in 1707 seconds Splitting and word-pair counting file >>>corpus-11.txt<<< Sent out article in 3007 seconds Splitting and word-pair counting file >>>corpus-12.txt<<< Sent out article in 3455 seconds Splitting and word-pair counting file >>>corpus-10.txt<<< Sent out article in 3196 seconds Splitting and word-pair counting file >>>corpus-8.txt<<< Sent out article in 2479 seconds Splitting and word-pair counting file >>>corpus-9.txt<<< Sent out article in 2738 seconds real 279m53.779s Much faster than the rocksDB version but still stunningly slow... PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 23108 ubuntu 20 0 39.1g 37.3g 30760 S 0.0 14.8 1958:42 guile Total of 445 atoms in atomspace ... (gc-stats) ((gc-time-taken . 7407763599459) ;; 7407 seconds=2 hours whoa (heap-size . 33619968) (heap-free-size . 13774848) (heap-total-allocated . 449690164592) 450 GBytes just like rocks version (heap-allocated-since-gc . 6156032) (protected-objects . 51) (gc-times . 29863)) OK, so mem explosion was a link-grammar init-table bug. https://github.com/opencog/link-grammar/pull/1149 Also tweaked postgress config: Splitting and word-pair counting file >>>corpus-7.txt<<< Sent out article in 1466 seconds Splitting and word-pair counting file >>>corpus-11.txt<<< Sent out article in 2996 seconds after parsing 96044 sentences -- ((gc-time-taken . 3341717491309) <<< 3341 seconds ouch! (heap-size . 33153024) (heap-free-size . 18305024) (gc-times . 19226)) << so one gc every 4.995 sentences ... is this hard-coded somewhere? `sometimes-gc` and `maybe-gc (report-avg-gc-cpu-time) Elapsed: 5544.8 secs. Rate: 213.7 gc/min %cpu-GC: 63.66% %cpu-use: 696.6% ============================= currrent status postgres word pair counting: ... real 279m2.812s == 16743 secs wall clock total of 290676 sentences so: 17.36 sentences/second cputime= 1896:14 = 113775 secs so 6.8 threads avg. cpu processing is 2.55 sents per cpu sec. (gc-time-taken . 8468975661196) = 8468 secs = 7.4% in gc ... (gc-times . 32526)) = 8.9 sentences per gc (heap-total-allocated . 449898824256) = 450GB = 1.5MB/sentence vs same dataset, RocksDB: ... real 313m46.959s = 18827 secs wall clock total of 290676 sentences so: 15.4 sentences/second cputime= 2098:46 = 125946 secs so 6.7 threads avg cpu processing is 2.3 sents per cpu-sec (gc-time-taken . 9867992214539) = 9868 secs = 7.8% in gc (gc-times . 64144) = 4.5 sentences per gc (heap-total-allocated . 449155740720) = 449GB = 1.5MB/sentence why so much?? ----------- disjunct counting ... cogserver is nearly idle... MPG-Processing file >>>corpus-7.txt<<< Can't connect to port 18108! Seems to be waiting on postgres, which is waiting on disk, because rocks in another process is hogging the disk bandwidth!? This is the non-optimized postgres... MPG-Processing file >>>corpus-7.txt<<< Sent out article in 851 seconds MPG-Processing file >>>corpus-11.txt<<< Sent out article in 1485 seconds MPG-Processing file >>>corpus-12.txt<<< Sent out article in 1513 seconds MPG-Processing file >>>corpus-10.txt<<< Sent out article in 1141 seconds MPG-Processing file >>>corpus-8.txt<<< Sent out article in 876 seconds MPG-Processing file >>>corpus-9.txt<<< Sent out article in 1004 seconds real 115m55.803s (this is the non-optimized postgres) DB activity: 587 tps, 0 rollbs/s, 6 buffer r/s, 99 hit%, 941 row r/s, 47 DB activity: 638 tps, 0 rollbs/s, 5 buffer r/s, 99 hit%, 1132 row r/s, 613 row w/s DB activity: 675 tps, 0 rollbs/s, 4 buffer r/s, 99 hit%, 1275 row r/s, 683 row w/s DB I/O: 0 reads/s, 0 KB/s, 1169 writes/s, 5101 KB/s Total DISK READ: 0.00 B/s | Total DISK WRITE: 2.55 M/s Current DISK READ: 0.00 B/s | Current DISK WRITE: 2.53 M/s Why is pg_top reporting twice the rate of iotop? Is this an mdraid thing? PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND 11723 postgres 20 0 217M 12M sleep 0:04 4.21% 8.74% postgres: 12/mai 11720 postgres 20 0 217M 12M sleep 0:09 5.48% 8.54% postgres: 12/mai 11727 postgres 20 0 217M 12M sleep 0:09 6.37% 8.34% postgres: 12/mai 11728 postgres 20 0 217M 12M run 0:07 4.63% 8.34% postgres: 12/mai 11722 postgres 20 0 217M 12M sleep 0:09 5.07% 6.95% postgres: 12/mai 11729 postgres 20 0 217M 12M sleep 0:03 3.44% 5.36% postgres: 12/mai 11732 postgres 20 0 218M 16M sleep 0:09 4.77% 0.79% postgres: 12/mai 24851 postgres 20 0 215M 6144K sleep 0:30 0.06% 0.00% postgres: 12/mai 11725 postgres 20 0 217M 12M sleep 0:08 4.22% 0.00% postgres: 12/mai ((PredicateNode . 15) (ListLink . 240) (AnyNode . 2) (Connector . 29) (ConnectorDir . 2) (ConnectorSeq . 371468) (Section . 466340) (EvaluationLink . 240) (TypeNode . 1) (AnchorNode . 1) (SchemaNode . 1) (PostgresStorageNode . 1) (WordNode . 15) (LinkGrammarRelationshipNode . 1)) (psa 'fetch-pairs) Elapsed time to load csets: 65 secs ((add-support-compute psa) 'cache-all) (btr 'mmt-marginals) Done storing 371468 left-wilds in 1567 secs Done storing 371515 left-wilds in 1722 secs OK, so that's faster than RocksDB, which took 2511 secs rocks started out faster, then really slowed down... Why? .... why? update: the incoming set. After redesign, rocks is faster. Also, why is the number different? (define rpt-obj (add-report-api psa)) (define size (rpt-obj 'num-pairs)) (define size (rpt-obj 'left-dim)) (define dim-key (PredicateNode "*-Dimension Key-*")) (define wild-atom (psa 'wild-wild)) (define (get-left-dim) (inexact->exact (round (cog-value-ref (cog-value wild-atom dim-key) 0)))) support stuff was not stored ...!? Why? ((add-support-compute psa) 'cache-all) fails to compute the dim key well, OK, central-compute is needed for that... (define size (rpt-obj 'num-pairs)) same as (psu 'total-support-left or total-support-right (define nrows (rpt-obj 'left-dim)) same as (psa 'left-basis-size) (define ncols (rpt-obj 'right-dim)) ----- expt-6 valgrind corpus-3.txt ==29958== definitely lost: 3,151,147 bytes in 67 blocks ==29958== indirectly lost: 310 bytes in 10 blocks ==29958== possibly lost: 139,792 bytes in 26 blocks ==29958== by 0x120C2218: init_table (count.c:216) ==29958== by 0x120C2218: alloc_count_context (count.c:1333) ==29958== by 0x120C4EDB: classic_parse (parse.c:417) ==29958== by 0x1209EBFF: sentence_parse (api.c:698) ==29958== by 0x12065C49: opencog::LGParseLink::execute(opencog::AtomSpace*, bool) (LGParseLink.cc:173) corpus-4.txt ==25891== definitely lost: 14,753,451 bytes in 242 blocks ==25891== indirectly lost: 3,434,581 bytes in 289 blocks ==25891== possibly lost: 1,188,368 bytes in 34 blocks 140 is ulimit 557 848 1865 -> 1185 OK, so 5481 sents in real 5m43.822s so 16 sents/second But only 1017 thread frees...!? with valgrind ... stopped at 1017 thread frees ... Ohh duhh bad code. OK, so 5481 sents in real 3m24.520s = 204 secs so 27 sents/sec thats better. and ram looks good but gc-time-taken=115 secs so wow too much. (gc-times . 1342) maybe its too much OK fixed the leak, it seems: Again: with rocks: Splitting and word-pair counting file >>>corpus-7.txt<<< Sent out article in 1938 seconds Splitting and word-pair counting file >>>corpus-11.txt<<< Sent out article in 4147 seconds ((gc-time-taken . 2683799849797) (heap-size . 22302720) (gc-times . 14410)) at 90686 sentences parsed... so one gc every 6.3 sentences Elapsed: 6474.5 secs. Rate: 134.7 gc/min %cpu-GC: 41.82% %cpu-use: 573.7% vast amounts of gc because vast amounts of heap... who is using this? -------------------------------------- expt-8 (define pca (make-pseudo-cset-api)) (define psa (add-pair-stars pca)) (psa 'fetch-pairs) (psa 'left-basis) (psa 'right-stars (WordNode "mouse")) dog cat mouse bird squirrel: the- or (the- & chased+ & the+) or (the- & saw+ & the+) the: LEFT-WALL- & (cat+ or squirrel or mouse+ or bird+ or dog+) saw chased: (bird- or cat- or ...) & the+; ~/src/learn/run/4*/run-gram-cogserver.sh (gram-classify-greedy-discrim 0.5 4) (cog-get-atoms 'WordClassNode) (cog-get-root (WordClassNode "squirrel dog")) (cog-get-root (WordClassNode "saw chased")) (star-obj 'right-stars (Word "saw")) (star-obj 'right-duals (Word "saw")) saw chased: (gram-classify-greedy-fuzz 0.65 0.3 4) /usr/local/share/guile/3.0 [GenericShell] evaluator error: ice-9/boot-9.scm: (quit-exception? (apply throw 'quit args)) _evaluator->eval_error() _caught_error _error_string scm_primitive_exit(0) foo get_quit_exception scm_exit_status _wait_done.notify_all(); restore_output termios libguile/posix.c: ~/src/learn/run/4*/run-gram-cogserver.sh duude if=2502 duude of=5 duude cf=bf duude lf=8a3b must mst set ISIG ICANON ECHO duude if=22402 must set ICRNL IXON IUTF8 duude of=5 duude cf=277 duude lf=105073 duude if=42400 ------------------ Sexpr::decode_value RocksStorage::getKeys ------------------ After triming, 1 words left, out of 10 echo $GRAM_CLUSTER | nc $HOSTNAME $PORT (define gca (make-gram-class-api)) (define gcs (add-pair-stars gca)) (define gcf (add-wordclass-filter gcs)) after clustering do (gca 'fetch-pairs) (define cset-obj (make-pseudo-cset-api)) (define psa (add-pair-stars cset-obj)) (define asc (add-singleton-classes psa)) (asc 'create-hi-count-singles 1) (batch-all-pair-mi gcf) (gcs 'left-basis) (gcf 'left-basis) (length (gcf 'right-basis)) (length (gcf 'right-duals (WordClassNode "squirrel dog"))) (length (gcf 'right-stars (WordClassNode "squirrel dog"))) (batch-all-pair-mi gcf) (use-modules (opencog nlp lg-export)) (export-csets gcf "dict.db" "EN_us") monitor-parse-rate modified: depcomp modified: install-sh modified: missing commit dfc4362417bf9aa3468a29360681f13f76949ac6 $(libguile_dbi_la_LINK) libguile_dbi_la_LDFLAGS = -export-dynamic -version-info 2:6:0 libguile_dbi_la_LDFLAGS = -export-dynamic -version-info @DBI_INTERFACE@ src/guile-dbd-mysql.c:120 || || || LEFT-WALL.2 squirrel.3 a.1 Many duplicates!!!! <###LEFT-WALL####uni>|TB+ & TB+|-2.58496250072116 |TD- & TC- & TB- & TE+|-1.58496250072116 |TB- & TE+|-1.58496250072116 |TE-|-1.58496250072116 |TE- & TF+ & TC+|-1.58496250072116 |TF- & TD+|-2.58496250072116 The duplicate are because the connectors were never consolidated into classes. cset-to-lg-dj unique export-dictionary.scm LG_DICT_EXPORT EN_us linkparser> the squirrel a dog Found 1 linkage (1 had no P.P. violations) Unique linkage, cost vector = (UNUSED=0 DIS=-8.92 LEN=2) +-----------TB-----------+ +---TB---+---TE---+ +-TE-+ | | | | | LEFT-WALL.2 the.1 squirrel.3 a.1 dog.3 linkparser> the squirrel saw a dog Found 1 linkage (1 had no P.P. violations) Unique linkage, cost vector = (UNUSED=0 DIS=-11.51 LEN=4) +--------------TB--------------+ | +-----TC-----+ +---TB---+---TE---+---TF--+-TD-+-TE-+ | | | | | | LEFT-WALL.2 the.1 squirrel.3 saw.4 a.1 dog.3 $ ~/src/learn/run/3*/run-mst-cogserver.sh (define (prt-atom h) (display h) #f) (cog-map-type prt-atom 'WordNode) (WordNode "###LEFT-WALL###" (ctv 1 0 4800)) (WordNode "saw" (ctv 1 0 2400)) (WordNode "the" (ctv 1 0 4800)) (cog-incoming-set (WordNode "###LEFT-WALL###")) (define ala (make-any-link-api)) (define asa (add-pair-stars ala)) (define als (add-support-api asa)) (define alf (add-pair-freq-api als)) (ala 'fetch-pairs) (ala 'pair-count (WordNode "###LEFT-WALL###") (WordNode "the")) (alf 'help) ,d add-pair-freq-api (alf 'left-count (WordNode "the")) (alf 'wild-wild-count) (alf 'pair-fmi (alf 'get-pair (Word "###LEFT-WALL###") (Word "the"))) FMI's: wall-the 1.411 (alf 'get-pair (Word "###LEFT-WALL###") (Word "saw")) none-such! (als 'get-all-elts) (als 'right-stars (Word "###LEFT-WALL###")) (als 'right-duals (Word "###LEFT-WALL###")) Why only the determiners? Was this filtered!? No ... then what? link-pipeline.scm (define phr (Phrase "the dog chased the cat")) (define lgn (LgParseMinimal phr (LgDict "any") (Number 24))) (define sent (cog-execute! lgn)) (sentence-get-parses sent) (define parse (list-ref (sentence-get-parses sent) 12)) (parse-get-links parse) 1: wall-the the-dog dog-chase chase-the the-cat (for-each (lambda (parse) ; (format #t "parse ~A" parse) (for-each (lambda (link) (if (equal? (WordNode "###LEFT-WALL###") (word-inst-get-word (gadr link))) (if (and (not (equal? (WordNode "the") (word-inst-get-word (gddr link)))) (not (equal? (WordNode "dog") (word-inst-get-word (gddr link))))) (format #t "its ~A ~A" (word-inst-get-word (gddr link)) parse)))) (parse-get-links parse))) (sentence-get-parses sent)) Hmm parse #156 is the first parse that links to "dog" parse #225 first to link to chased #393 first to link to "cat" total of 1162 linakges so this sampling is extremely non-random! and no parses link elsewhere. OK this was a bug in LgParseLink and was fixed. No LEFT-WALL is being exported .. why? Oh, need to import disjuncts .. fixed. -------- expt-12 works but then guile-gram-fake> Done clustering Rocks: initial aid=441673 <<<<<<<<< wow that's pretty big... too big!? Support: found num left= 2 num right= 27798 in 7 secs Total count N(*,*) = 372848.5999 ... Done storing 2 right-wilds in 0 secs Done saving -log P(x,*) and P(*,y) Going to compute and store individual pair MI. Backtrace: In srfi/srfi-1.scm: 586:29 19 (map1 (# # # # # # # # # # # # # # # # # # # # # # # # …)) In opencog/matrix/compute-mi.scm: 345:56 1 (do-one-pair #) In unknown file: 0 (cog-value-ref #f 0) ERROR: In procedure cog-value-ref: In procedure cog-value-ref: Wrong type argument in position 1 (expecting opencog value): #f Failure exporting the dictionary! guile -l ~/src/learn/run-common/cogserver.scm (define gca (make-gram-class-api)) (gca 'fetch-pairs) (define gcs (add-pair-stars gca)) (define gcf (add-wordclass-filter gcs)) (define frqobj (add-pair-freq-api gcf #:nothrow #t)) (define left-item (WordClassNode "b f")) (define right-item (ConnectorSeq (Connector (WordNode "d") (ConnectorDir "-")) (Connector (WordNode "a") (ConnectorDir "+")))) (define lipr (gca 'get-pair left-item right-item)) (define pr-freq (frqobj 'pair-freq lipr)) Ah hah frequencies missing on gcf. make-compute-freq .. turns out no, counts are missing. batch-all-pair-mi clready does (define freq-obj (make-compute-freq wild-obj)) at line 573 line 620 is (freq-obj 'init-freq) prints Done computing 30026 pair frequencies in 3 secs so wtf ... line 624 called (freq-obj 'cache-all-pair-freqs) MI starts line 645 Jumps to line 297 (define supobj (add-support-api gcf)) (cog-keys lipr) shows no keys. ... not even a TV ...!? (define mcf (make-compute-freq gcf)) line 258... do it by hand (mcf 'cache-all-pair-freqs) .. fails... at line 232 (for-each right-loop (wldobj 'left-basis)) where wildobj == gcs so line 227 fails... (cache-pair-freq lipr) ... line 203 (compute-pair-freq at line 189 fails cause no count on the item. why does lipr not have a TV on it??? How many (gcs 'right-stars left-item) are missing keys? (use-modules (srfi srfi-1)) (fold (lambda (star cnt) (if (nil? (cog-keys star)) (+ 1 cnt) cnt)) 0 (gcs 'right-stars left-item)) answer 998 of them! out of 45186 .. how? Oh. Use nil? not null? Use nil? not null? for cog-value-ref; it returns #f or '() fixed in f29b3dc52 opencog/atomspace Try again. . config/3*sh guile -l ~/src/learn/run-common/cogserver.scm (define cset-obj (make-pseudo-cset-api)) (define star-obj (add-pair-stars cset-obj)) (cset-obj 'fetch-pairs) (define right-item (ConnectorSeq (Connector (WordNode "d") (ConnectorDir "-")) (Connector (WordNode "a") (ConnectorDir "+")))) (star-obj 'left-stars right-item) There are 10 of these. All of these have counts on them. So it was the classifier that busted things. mpg_parse.rdb has MM^T on it but no support, no MI . config/4*sh guile -l ~/src/learn/run-common/cogserver.scm (star-obj 'left-stars right-item) Only one is left. Try (gram-classify-greedy-discrim 0.5 4) see what happens... (gcs 'left-stars right-item) has two items, both have counts. try again, manually. (gram-classify-greedy-fuzz 0.65 0.3 4) (define gca (make-gram-class-api)) (define gcs (add-pair-stars gca)) (gcs 'left-stars right-item) has two items, both have counts. (Section (ctv 1 0 683) (WordClassNode "b f") (Section (ctv 1 0 1340) (WordClassNode "e j") wtf ... was there a store problem? yes ... exit server, close, re-open, have no TV on (Section (WordClassNode "b f") why not? Is this a rocks bug or a script bug? (cog-rocks-open "rocks:///home/ubuntu/data/expt-12/gram-4.rdb") (cog-rocks-get "l@(Section (WordClassNode \"b f\")(ConnectorSeq (Connector (WordNode \"d\")(ConnectorDir \"-\"))(Connector (WordNode \"a\")(ConnectorDir \"+\"))))") (cog-rocks-get "k@rXg1") ... returns nothing. even after immediately computing things. So it was never saved...!? need to (gcs 'store-aux) ??? This doesn't fix it. (gram-classify greedy-over-words (make-fuzz make-fuzz stores some stuff... ... seems to store the revised marginals on the words and the class what about the section itself? gram-projective.scm explicit store works. (define gca (make-gram-class-api)) (define gcs (add-pair-stars gca)) (define gco (make-store gcs)) (gco 'store-all) with explicit store: select count(*) from disjuncts; 1754 without explicit store: 1530 clustered exported db generates no sentences. non-clustered db works. clustered db is missing left-wall. Why? (define gca (make-gram-class-api)) (define gcs (add-pair-stars gca)) (gcs 'fetch-pairs) (define (prt-atom h) (display h) #f) (cog-map-type prt-atom 'Word) (cog-map-type prt-atom 'WordClass) No uni for left wall? (define pca (make-pseudo-cset-api)) (define psa (add-pair-stars pca)) (psa 'fetch-pairs) (define asc (add-singleton-classes psa)) (asc 'create-hi-count-singles 1) After trimming, 9 words left, out of 11 Created 0 singleton word classes so no keys on left-wall (use-modules (srfi srfi-1)) (define lw (WordNode "###LEFT-WALL###")) (length (cog-incoming-set lw)) ; 19887 (list-ref (cog-incoming-set lw) 2) (cog-keys (list-ref (cog-incoming-set lw) 333)) ; None!!! wtf! for gram-5, at least ... ... but mpg_parse does have counts on left-wall. How did they disappear? issue: failed to wait for cogserver to finish loading. 2-word-pairs/run-all.sh 3-mst-parsing/run-all-mst.sh 4-gram-class/run-all-gram.sh all-in-one.sh echo -e "(block-until-idle 0.01)\n.\n." another issue: export prints Going to compute and store individual pair MI. but that's a lie, its never stored... no, ok, it just doesn't do the progress prints, that's all... DO-STORE is true... ... huh, claims it is stored... is it? so, yeah they are there... but wait .. (gcs 'left-stars right-item) shows that some unis have no counts... wtf again, done in gram-7 now export-7 (define gca (make-gram-class-api)) (define gcs (add-pair-stars gca)) (define right-item (ConnectorSeq (Connector (WordNode "d") (ConnectorDir "-")) (Connector (WordNode "a") (ConnectorDir "+")))) (gcs 'left-stars right-item) only two items, they have counts but no MI's on them. (use-modules (srfi srfi-1)) (fold (lambda (star cnt) (if (nil? (cog-keys star)) (+ 1 cnt) cnt)) 0 (gcs 'get-all-elts)) ; 0 OK everything has a count. good (length (gcs 'get-all-elts)) ; 93056 (define pca (make-pseudo-cset-api)) (define psa (add-pair-stars pca)) (psa 'fetch-pairs) (length (psa 'get-all-elts)) ; 32383 (fold (lambda (star cnt) (if (nil? (cog-keys star)) (+ 1 cnt) cnt)) 0 (psa 'get-all-elts)) ; 9869 ok that's broken, but maybe from the earlier breakages. Need rerun from scratch. After re-running from scratch, get (length all ...; 90809 (fold ... ; 26245 for the fold. Are these the zero-count disjuncts!? why aren't they deleted? WTF?? for gcs: 94984 total elts, and 25329 without counts... ----------- try to figure it out. expt-13 cp -pr mpg_parse.rdb gram-2.rdb (define pca (make-pseudo-cset-api)) (define psa (add-pair-stars pca)) (psa 'fetch-pairs) (use-modules (srfi srfi-1)) (length (psa 'get-all-elts)) ;134089 (fold; 0 very good. cluster (gram-classify-greedy-fuzz 0.65 0.3 4) (length (psa 'get-all-elts)) ;32625 wow big drop but expected OK (fold ; 0 so still good! (define pca2 (make-pseudo-cset-api)) (define psa2 (add-pair-stars pca2)) (length (psa2 'get-all-elts)) ; 32625 OK consistent (fold ; 0 still good. ((make-store (make-gram-class-api)) 'store-all) close, restart cogserver load. (length (psa 'get-all-elts)) ; 32625 OK, no change (use-modules (srfi srfi-1)) (fold (lambda (star cnt) (if (nil? (cog-keys star)) (+ 1 cnt) cnt)) 0 (psa 'get-all-elts)) fold ; 9685 OK, so ... TV's disappeared. How? ------ gram-3 (use-modules (srfi srfi-1)) (fold (lambda (star cnt) (if (nil? (cog-keys star)) (+ 1 cnt) (if (nil? (cog-value star (Predicate "*-TruthValueKey-*"))) (+ 1 cnt) cnt))) 0 (psa 'get-all-elts)) 0 -- so they all have TV's. (gram-classify-greedy-fuzz 0.65 0.3 4) (fold ; -- still zero. ((make-store psa) 'store-all) close, reopen. (length (psa 'get-all-elts)) ; 32625 OK good. fold ; 0 Yayyy! well, that solves that! ------ gram-4b LEFT-WALL c d g LEFT-WALL j f j LEFT-WALL h a a LEFT-WALL b b h LEFT-WALL c e d LEFT-WALL j c g LEFT-WALL a j f LEFT-WALL g j b LEFT-WALL d a b LEFT-WALL f d a LEFT-WALL f g d LEFT-WALL f j g LEFT-WALL f g b LEFT-WALL c b e LEFT-WALL b a h LEFT-WALL d b b LEFT-WALL a h c LEFT-WALL e f c LEFT-WALL j b d +-------TB------+ | +---TC--+ +---TB--+-TE+-TD+ | | | | LEFT-WALL.2 c.1 d.3 g.1 +-------TB------+ +---TB--+-TE+-TD+ | | | | LEFT-WALL.2 c.1 d.3 g.1 +-------TB------+ +-----TH----+ | +---TB--+-TE+-TD+ | | | | LEFT-WALL.2 c.1 d.3 g.1 j f j is swapped B-H and D-E ./gen-dict.scm ../run-config/1-dict-conf.scm foo (base-26 n #t) ~/src/learn/run/5-compare/dict-comp.scm fake-lang learned-gram-4c fake-corpus/corpus-3.txt --------------------- add-linkage-filter cset-class.scm: -- merges, but no define-public connector-seq-compare .. its dead code... gram-agglo.scm: gram-projective.scm: shape-vec.scm: (load "/home/ubnutu/src/learn/scm/cset-class.scm") Who writes MemberLinks? gram-projective.scm does ... so does gram-class-api.scm ... merge-project only does the sections, not shapes. Why not? I think it could. It creates the new sections directly... thats a bug. What is merge-discrim? (define wsv (make-shape-vec-api)) (wsv 'fetch-pairs) (define wss (add-pair-stars wsv)) (wss 'left-basis-size) ((add-support-compute wss) 'cache-all) (define wst (batch-transpose wss)) (wst 'mmt-marginals) (define s (EvaluationLink (PredicateNode "*-word-shape pair-*") (WordNode "!") (EvaluationLink (PredicateNode "*-shape-*") (WordNode "d") (Connector (WordNode "d") (ConnectorDir "-")) (Connector (WordNode "b") (ConnectorDir "-")) (Connector (WordNode "e") (ConnectorDir "-")) (Connector (WordNode "b") (ConnectorDir "-")) (Connector (WordNode "a") (ConnectorDir "+")) (Connector (VariableNode "$connector-word") (ConnectorDir "+"))))) (wss 'left-stars s) (cog-atom? x) f-left-star-pat 'left-star-pattern ((add-support-compute wss) 'cache-all) Finished left norm marginals in 271 secs Finished left totals in 34 secs Finished left norm marginals in 1295 secs duude failure not atom at 92387 of 101856 (define e (WordNode "e")) (define sa (EvaluationLink (PredicateNode "*-shape-*") (WordNode "a") (Connector (WordNode "e") (ConnectorDir "-")))) (wsv 'get-pair e sa) no stars for .... cause its not an item, its a pair! (wss 'left-basis) (define s (car (wss 'right-basis))) (wss 'left-stars s) (cog-incoming-set pair-pred) (WordNode "d") (define rs (wss 'right-stars (WordNode "d"))) 92387 of 101856 (list-ref rs 92387) (define var (Variable "$api-right-star")) (define term (wss 'make-pair (WordNode "d") var)) (define b (Bind (TypedVariable var (Type "EvaluationLink")) term term)) (use-modules (opencog exec)) (define setl (cog-execute! b)) (define o (cog-outgoing-set setl)) (list-ref o 92387) (define term (EvaluationLink (PredicateNode "*-word-shape pair-*") (WordNode "d") (VariableNode "$api-right-star"))) which is wrong. . (cog-incoming-set term) Doesn't happen when fresh. (length o) ; 101855 insead of 101856 (for-each (lambda (atm) (if (equal? atm term) (format #t "yes its here\n"))) o) (all-left-marginals) (maybe-par-for-each set-left-marginals (star-obj 'right-basis)) (define wsc (add-support-compute wss)) (for-each (lambda (atm) (wsc 'set-left-marginals atm)) (wss 'right-basis)) still good. so OK. (wsc 'total-support-left) still OK (wsc 'total-count-left) still OK (define wap (add-support-api wss)) (wap 'set-left-totals (wsc 'total-support-left) (wsc 'total-count-left)) still OK (wsc 'all-left-marginals) still OK ... so wtf is going on?? ---- (for-each (lambda (atm) (wsc 'set-right-marginals atm)) (wss 'left-basis)) and its kaboom. duude failure not atom at 11440 of 39214 duude wtf right fail for (WordNode "b") although the bindlink is OK still. ---- (define wsv (make-shape-vec-api)) (wsv 'fetch-pairs) (define wss (add-pair-stars wsv)) (wss 'left-basis-size) (define wsc (add-support-compute wss)) (for-each (lambda (atm) (wsc 'set-right-marginals atm)) (wss 'left-basis)) duude failure not atom at 92387 of 101856 so very reproducible. Not raceey Direct: (define wsv (make-shape-vec-api)) (wsv 'fetch-pairs) (define wss (add-pair-stars wsv)) (wss 'left-basis-size) (define var (Variable "$api-right-star")) (define term (wss 'make-pair (WordNode "d") var)) (define b (Bind (TypedVariable var (Type "EvaluationLink")) term term)) (use-modules (opencog exec)) (define setl (cog-execute! b)) (cog-arity setl) ; 36320 ... wtf!? ohh, bug (define o (cog-outgoing-set setl)) (length o) ; 101856 .. wtf!? (list-ref o 92387) ; borken... /* Arity is currently an unsigned short */ uhh ohhh QueryLink::do_execute Implicator impl(as); impl.implicand = this->get_implicand(); impl.satisfy(PatternLinkCast(get_handle())); QueueValuePtr ohhh nooo 26915 ohhh nooo 4473 ohhh nooo 623 ohhh nooo 10725 OK ohhh nooo 4358 But (list-ref o 92387) is still the location of the baddie!? OK 4555 atomic 4382/4474/28622 4402/33396 4286/8018 report_grounding bool have_var=false; for (const auto& j: var_soln) { Handle var(j.first); Type vtype = var->get_type(); if (VARIABLE_NODE == vtype) have_var = true; } if (not have_var) printf("duuude missing var grounding on %d\n", cnt.load()); duuude FINITO var grounding at gnt=4106 [FINE] ==================== FINITO! accepted= 0 [FINE] There are groundings for 1 terms [FINE] Groundings for 0 variables: [FINE] Groundings for 0 clauses: [FINE] xxxxxxxxxx neighbor_search xxxxxxxxxx Loop candidate (4554/108680): incomplete search. Normally: Loop candidate Clause is matchable; start matching i -- explore_clause_direct duuude FINITO var grounding at gnt=11121 duuude FINITO var grounding at gnt=22310 duuude FINITO var grounding at gnt=22377 insert into NACK .. all of them. cache hit -- the four failures! search: nserting into cache -- each and every one. search: no key must explore -- zero hits in good or bad. cahce hit:term is clause, grnd is the variable. chached val is clause Q: why is var not writtten!? Q: why is it erratic? A: because neighbor-search is not in consistent order. _search_set at line 1096 is a HandleSeq from _start_choices get_incoming_set TermMatchMixin.cc line 473 wow when sorted, its 345 baddies. why does the number of hits vary? weird: _pat->clause_variables.at() seems to ctonain terms... line 1964 or not.. const HandleSeq& clvars(_pat->clause_variables.at(pclause)); size_t cvsz = clvars.size(); if (1 == cvsz and const Handle& jgnd(var_grounding.at(clvars[0])); key = HandleSeq({clause_root, jgnd}); $- f- h+ g+ a+ So... explore ... find, fail, insert into nack. there are 101855 reported, there are 108680 in loops there are 108335 nacks (which with 345 hits add up.) there are 101855 cache insertions. there are 102200 FINITO's (= 101855 + 345) all the rest are shapes (length (cog-incoming-set (PredicateNode "*-shape-*"))) 510993 wow. (length (cog-incoming-set (PredicateNode "*-word-shape pair-*"))) 620436 (length (cog-incoming-set (WordNode "d"))) 110908 ; ok that is the thinnest. (length (cog-incoming-by-type (WordNode "d") 'EvaluationLink)) 108680 ; ok that's the loop, then. and clearly some of these are shapes. why are there no nacks, despite nack insertiions? why are there no "no keys"? Because the clause is always cacheable. Yow. That was hard work. Fixed in https://github.com/opencog/atomspace/pull/2803 ------------------------------------------- So, where were we? (define wsv (make-shape-vec-api)) (wsv 'fetch-pairs) (define wss (add-pair-stars wsv)) (wss 'left-basis-size) ((add-support-compute wss) 'cache-all) (define wst (batch-transpose wss)) (wst 'mmt-marginals) Holy cow: Stored 40000 of 510993 left-wilds in 101 secs (396 pairs/sec) Stored 80000 of 510993 left-wilds in 356 secs (112 pairs/sec) Stored 120000 of 510993 left-wilds in 607 secs (66 pairs/sec) Stored 160000 of 510993 left-wilds in 865 secs (46 pairs/sec) Stored 200000 of 510993 left-wilds in 1075 secs (37 pairs/sec) Stored 240000 of 510993 left-wilds in 1352 secs (30 pairs/sec) Stored 280000 of 510993 left-wilds in 1591 secs (25 pairs/sec) Stored 320000 of 510993 left-wilds in 1841 secs (22 pairs/sec) Stored 360000 of 510993 left-wilds in 2076 secs (19 pairs/sec) Stored 400000 of 510993 left-wilds in 2296 secs (17 pairs/sec) Stored 440000 of 510993 left-wilds in 2584 secs (15 pairs/sec) Stored 480000 of 510993 left-wilds in 2871 secs (14 pairs/sec) Done storing 510993 left-wilds in 19969 secs freaking disaster. try again with postgres!? open mpg_parse.rdb and load all of it then save to postgres . config/0*h . config/3*h guile -l ~/src/learn/run-common/cogserver.scm (load-atomspace) (cog-close storage-node) (use-modules (opencog persist-sql)) (sql-create "postgres:///expt_16_shape") (sql-open "postgres:///expt_16_shape") (store-atomspace) ^D Stored 100K atoms in 377 seconds (265 per second) Stored 200K atoms in 720 seconds (277 per second) Finished storing 232500 atoms total, in 833 seconds (279 per second) vi config/3*h use postgres (define wsv (make-shape-vec-api)) (wsv 'fetch-pairs) Elapsed time to load word sections: 39 seconds Elapsed time to load word-shape pairs: 0 seconds Elapsed time to create shapes: 31 secs vs. Rocks: Elapsed time to load word sections: 7 seconds Elapsed time to load word-shape pairs: 0 seconds Elapsed time to create shapes: 35 secs So postgres had 6x slower load, at least the first time... Ouch. But .... if we use Rocks to open the shape db which has the shapes in it, then disaster strikes: Elapsed time to load word sections: 112 seconds What if we do the same for postgres? Elapsed time to load word sections: 29 seconds so it's actually a little faster than it was the first time... (define wss (add-pair-stars wsv)) (wss 'left-basis-size) ((add-support-compute wss) 'cache-all) Finished left norm marginals in 184 secs Finished left totals in 8 secs Finished right norm marginals in 28 secs Finished right totals in 0 secs vs rocks: Finished left norm marginals in 189 secs Finished left totals in 9 secs Finished right norm marginals in 29 secs Finished right totals in 0 secs So identical, here. (define wst (batch-transpose wss)) (wst 'mmt-marginals) Finished mmt norm marginals in 66 secs Finished mmt totals in 0 secs Done storing 12 right-wilds in 0 secs Stored 40000 of 510993 left-wilds in 170 secs (235 pairs/sec) Stored 80000 of 510993 left-wilds in 168 secs (238 pairs/sec) Stored 120000 of 510993 left-wilds in 188 secs (213 pairs/sec) Stored 160000 of 510993 left-wilds in 177 secs (226 pairs/sec) Stored 200000 of 510993 left-wilds in 171 secs (234 pairs/sec) Stored 240000 of 510993 left-wilds in 165 secs (242 pairs/sec) Stored 280000 of 510993 left-wilds in 164 secs (244 pairs/sec) Stored 320000 of 510993 left-wilds in 176 secs (227 pairs/sec) Stored 360000 of 510993 left-wilds in 163 secs (245 pairs/sec) Stored 400000 of 510993 left-wilds in 189 secs (212 pairs/sec) Stored 440000 of 510993 left-wilds in 187 secs (214 pairs/sec) Stored 480000 of 510993 left-wilds in 185 secs (216 pairs/sec) Done storing 510993 left-wilds in 2246 secs vs rocks: Finished mmt norm marginals in 50 secs Finished mmt totals in 0 secs Done storing 12 right-wilds in 0 secs ... Done storing 510993 left-wilds in 19969 secs so here, where it matters, postgres is 9x faster, running in 37 minutes instead of 5.5 hours. Sheesh. Why is this shit so slow? Oh, hang on: redoing the above on Rocks, where the Atoms and Values are already there, i.e. writing out exactly the same data as what is there, gives a whole new result: ... Stored 440000 of 510993 left-wilds in 3 secs (13333 pairs/sec) Stored 480000 of 510993 left-wilds in 4 secs (10000 pairs/sec) Done storing 510993 left-wilds in 43 secs Wow! This suggests that the first time, maybe Rocks was recomputing some index with each and every store? Try this with postgres: (i.e. re-store exactly the same data) ... and that. too, is ... faster! Go figure!? ... Stored 440000 of 510993 left-wilds in 38 secs (1053 pairs/sec) Stored 480000 of 510993 left-wilds in 38 secs (1053 pairs/sec) Done storing 510993 left-wilds in 500 secs so that is 5x faster than the initial store. The magic of indexing. BTW, a rocks DB holding 750K atoms takes abuot 605M bytes so that works out to under 1KBytes per atom. So more compact than the in-RAM storage! ----------------------------------------------------- Concatenation of vectors. (define pca (make-pseudo-cset-api)) (define psa (add-pair-stars pca)) (psa 'fetch-pairs) Elapsed time to load csets: 292 secs (length (psa 'get-all-elts)) ; 80807 so 277 atoms/sec (define wsv (make-shape-vec-api)) (define wss (add-pair-stars wsv)) (wsv 'fetch-pairs) Elapsed time to load word sections: 110 seconds Elapsed time to load word-shape pairs: 0 seconds Elapsed time to create shapes: 31 secs XXX why is it creating? Aren't they already there? (length (wss 'get-all-elts)) ; 620435 (psa 'right-basis-size) ; 75667 (wss 'right-basis-size) ; 510993 (load "/tmp/c.scm") (define cac (direct-sum psa wss)) (cac 'right-basis-size) ; 586660 yay (length (cac 'get-all-elts)) ; 701242 = 80807 + 620435 yes, good ----------------------- Detour: Verify therading/locking behavor for batch-similarity. With rocks. (define pca (make-pseudo-cset-api)) (define psa (add-pair-stars pca)) (psa 'fetch-pairs) (define sim (batch-similarity psa #f)) (sim 'batch-compute 12) Done 10/12 Frac=47.69% Time: 71 Done: 98.5% Rate=0.915 prs/sec (1.092 sec/pr) again a few more times, for a baseline: Done 10/12 Frac= 0.0% Time: 58 Done: 98.5% Rate=0.586 prs/sec (1.706 sec/pr) Done 10/12 Frac= 0.0% Time: 51 Done: 98.5% Rate=0.667 prs/sec ( 1.5 sec/pr) Done 10/12 Frac= 0.0% Time: 50 Done: 98.5% Rate= 0.68 prs/sec (1.471 sec/pr) (sim 'parallel-batch 12 2) Fuu not enough to report. Do the left instead. (define lim (batch-similarity psa #t)) (lim 'batch-compute 300) Time: 41 Time: 17 Time: 17 (lim 'batch-compute 500) Time: 103 Time: 63 Time: 64 (lim 'parallel-batch 500 2) Time: 67 (lim 'parallel-batch 500 3) Time: 71 (lim 'parallel-batch 500 4) Time: 70 (lim 'parallel-batch 500 8) Time: 72 (lim 'parallel-batch 500 16) Time: 72 (lim 'parallel-batch 1000 8) Time: 563 OK, so guile-3.0.1-deb+1-2 offers no improvement at all. Bummer. ----------------------- OK back to work. (define pca (make-pseudo-cset-api)) (define psa (add-pair-stars pca)) (psa 'fetch-pairs) (define wsv (make-shape-vec-api)) (define wss (add-pair-stars wsv)) (wsv 'fetch-pairs) (define cac (direct-sum psa wss)) (define csc (add-pair-stars cac)) Then do run-common/marginals-mst.scm i.e. do (define cuc (add-support-compute csc)) (cuc 'cache-all) ... but that won't work, it runs at 1.3 columns/sec ... for 586660 entries it will take 4.5 days but 'all-right-marginals is fast: 49 seconds. OK, that was a caching bug. Now runs at 2K/sec total time of 405 secs. for each right (i.e. for each dj) do: sum over left-stars for that dj. .... finding the left-stars is slow... why, again? (define dj (list-ref (csc 'right-basis) 42)) ; is a dj (csc 'left-stars dj) get-internal-run-time (api-obj (add-support-api star-obj)) calls (set-norms (LLOBJ 'left-wildcard ITEM) L0 L1 L2)) (define sns (get-internal-run-time)) (define wild (LLOBJ 'left-wildcard ITEM)) (define ens (get-internal-run-time)) (format #t "get-wild took ~5f millis\n" (/ (- ens sns) 1000000.0)) Foo, do we actually need left marginals, or are we just spewing CPU time? ((make-central-compute csc) 'cache-all) ((make-central-compute csc) 'cache-total) Above isn't needed, the batcher does all this, yeah!? run-common/marginals-mst.scm does them .. why? ((add-support-compute psa) 'cache-all) ((make-central-compute psa) 'cache-total) (define btc (batch-transpose csc)) (btc 'mmt-marginals) (print-matrix-summary-report cac) (gram-classify-greedy-fuzz csc 0.65 0.3 4) (gram-classify-greedy-disinfo csc 3 4) ... make-disinfo uses make-pseudo-cset-api and we don't want that... if does use dynamic-stars... (current-time) start-time similari elapsed-count (define elapsed-secs (make-elapsed-secs)) ((make-store (make-pseudo-cset-api)) 'store-all) ((make-store (make-gram-class-api)) 'store-all) hang on... (make-store cac) will work (cac 'right-type) (cac 'pair-type (if (LLOBJ 'provides 'make-left-stars) (LLOBJ 'make-left-stars)) who fetchess marginals? forgot to do that on direct-sum! FIXED (PredicateNode (string-append "*-Direct Sum Wild " id-string)) -------------------- start again. (define pca (make-pseudo-cset-api)) (define psa (add-pair-stars pca)) (define wsv (make-shape-vec-api)) (define wss (add-pair-stars wsv)) (define cac (direct-sum psa wss)) (define csc (add-pair-stars cac)) (csc 'fetch-pairs) (define btc (batch-transpose csc)) (btc 'mmt-marginals) (print-matrix-summary-report csc) ((make-store csc) 'store-all) ; ... is this needed? Its slow! (print-matrix-summary-report csc) (gram-classify-greedy-disinfo csc 3 4) (gram-classify-greedy-discrim csc 0.5 4) (define csr (add-report-api csc)) (csr 'num-pairs) (define ww (csc 'wild-wild)) .. got nothing... did btc not store it? Hmmm fresh start; only keys on ww are MMT product key (define support-obj (add-support-api csc)) (support-obj 'total-support-left) *unspecified* (throw 'bad-summation 'compute-total-entropy link-generator -l learned -c 123123123 -s 3 LEFT-WALL e a ! LEFT-WALL e a ! LEFT-WALL j a ! LEFT-WALL a h ! vs LEFT-WALL j d ! LEFT-WALL j b ! LEFT-WALL j a ! LEFT-WALL j j ! LEFT-WALL j e ! generator is under-generating... j a ! has three linkages, only one generated. e a ! " " oh, I see all three are the same ... but there are supposed to be no synonyms, but accidentall there are... where are there accidental synonyms? Hmm. Answer: because word-classes list muultiple poses, and the poses are effectiely synonym classes. So.... we're not really generating langs correctly. Shoot. But also, given the dict, we're also not generating the full corpus, either; we're generating a corpus that is too small. --- wild-wild-count left-element verify clustering --- To-do=11 ncls=0 sing=1 nredo=0 2021-03-31 00:06:54 -- "j" --- Dist=0.9553 for word "e" -- "j" in 30.10 secs ---------Bingo! Dist=0.9553 for word "e" -- "j" ---------Merged 201883 sections in 1102. secs; 183.22 scts/sec --- Greedy-checking next 10 items (cog-get-atoms 'WordClassNode) (define ie (cog-incoming-by-type (WordClass "e j") 'Evaluation)) (length ie) ; 200115 (define is (cog-incoming-by-type (WordClass "e j") 'Section)) (length is) ; 1768 Holy cow! how many sections and shapes were on e and j to begin with!? (define ee (cog-incoming-by-type (Word "e") 'Evaluation)) (length ee) ; 124006 (define es (cog-incoming-by-type (Word "e") 'Section)) (length es) ; 1416 (define je (cog-incoming-by-type (Word "j") 'Evaluation)) (length je) ; 108047 (define js (cog-incoming-by-type (Word "j") 'Section)) (length js) ; 1195 Wow There sure are a lot of shapes. A heck of a lot more than .. expected. Why? Its a hundred-to-one blow-up... (cog-report-counts) (Section . 80807) (EvaluationLink . 1131429) so 14x more shapes than sections, which seems too much. overcounting. (length (cog-incoming-set (PredicateNode "*-shape-*"))) 510993 a only 6.3x more (length (cog-incoming-set (PredicateNode "*-word-shape pair-*"))) 620435 so 7.7x more (for-each (lambda (wn) (format #t "~A is ~A\n" (cog-name wn) (length (cog-incoming-by-type wn 'Section)))) (cog-get-atoms 'Word)) a is 21178 d is 2226 g is 591 b is 1212 ###LEFT-WALL### is 9113 j is 1195 i is 2113 h is 1558 c is 580 f is 781 e is 1416 ! is 38844 (for-each (lambda (wn) (format #t "~A is ~A\n" (cog-name wn) (length (cog-incoming-by-type wn 'Evaluation)))) (cog-get-atoms 'Word)) a is 177916 d is 108679 g is 9004 b is 42191 ###LEFT-WALL### is 122085 j is 108047 i is 14077 h is 40231 c is 33587 f is 40733 e is 124006 ! is 310872 How many connectors in a shape? How many connectors in a section? (use-modules (srfi srfi-1)) (fold (lambda (sec cnt) (+ cnt (cog-arity sec))) 0 (cog-get-atoms 'Section)) ; 161614 so double (Section . 80807) (fold (lambda (sec cnt) (+ cnt (cog-arity (gdr sec)))) 0 (cog-get-atoms 'Section)) 620435 connectors total, so exactly equal to (length (cog-incoming-set (PredicateNode "*-word-shape pair-*"))) and averages out to 7.6779858 connectors per section. Wow. That's a lot. That's huge. ; How many sections of a given length? (for-each (lambda (ARI) (format #t "len ~A secs ~A\n" ARI (fold (lambda (sec cnt) (if (= ARI (cog-arity (gdr sec))) (+ 1 cnt) cnt)) 0 (cog-get-atoms 'Section)))) (iota 20)) len 0 secs 0 len 1 secs 0 len 2 secs 606 len 3 secs 2311 len 4 secs 3723 len 5 secs 5150 len 6 secs 8387 len 7 secs 13907 len 8 secs 17362 len 9 secs 15043 len 10 secs 9061 len 11 secs 4000 len 12 secs 1257 len 13 secs 0 len 14 secs 0 So its not at all zipfian. (for-each (lambda (ARI) (format #t "Length= ~A NSections= ~A\n" ARI (fold (lambda (sec cnt) (if (= ARI (cog-arity (gdr sec))) (+ 1 cnt) cnt)) 0 (cog-get-atoms 'Section)))) (iota 20)) (for-each (lambda (ARI) (format #t "Length= ~A NConSeq= ~A\n" ARI (fold (lambda (sec cnt) (if (= ARI (cog-arity sec)) (+ 1 cnt) cnt)) 0 (cog-get-atoms 'ConnectorSeq)))) (iota 15)) (for-each (lambda (ARI) (format #t "Length= ~A NSections= ~A\n" ARI (fold (lambda (sec cnt) (if (and (= ARI (cog-arity (gdr sec))) (< 4 (cog-count sec))) (+ 1 cnt) cnt)) 0 (cog-get-atoms 'Section)))) (iota 15)) ----------------------- merge-frac does the merging (use-modules (opencog exec)) (define b (Bind ; (VariableList) (And (ConnectorSeq (Glob "$initial seq") (Connector (Word "a") (ConnectorDir "+")) (Glob "$final seq")) (ConnectorSeq (Glob "$initial seq") (Connector (Word "b") (ConnectorDir "+")) (Glob "$final seq"))) (ConnectorSeq (Glob "$initial seq") (Connector (Word "b") (ConnectorDir "+")) (Glob "$final seq")))) (define start-time (current-time)) (define r (cog-execute! b)) (format #t "Elapsed ~A\n" (- (current-time) start-time)) ; 700 seconds ! yikes to traverse 75669 connector seqs (cog-arity r) ; 184 ... 181 (define ca (Connector (Word "a") (ConnectorDir "+"))) (define cb (Connector (Word "b") (ConnectorDir "+"))) (define ca-seq (cog-incoming-by-type ca 'ConnectorSeq)) (define cb-seq (cog-incoming-by-type cb 'ConnectorSeq)) (length ca-seq) ; 10466 (length cb-seq) ; 18342 (define SEQ (list-ref ca-seq 3)) (define init (take-while (lambda (CON) (not (equal? ca CON))) (cog-outgoing-set SEQ))) (define find (span (lambda (CON) (not (equal? ca CON))) (cog-outgoing-set SEQ)) (define rw (cog-link 'ConnectorSeq (map (lambda (CON) (if (equal? ca CON) cb CON)) (cog-outgoing-set SEQ)))) (define eq2 (remove (lambda (PR) (null? (cdr PR))) (map (lambda (SEQ) (cons SEQ (cog-link 'ConnectorSeq (map (lambda (CON) (if (equal? ca CON) cb CON)) (cog-outgoing-set SEQ))))) ca-seq))) (length eq) ; 272 why is this different than (cog-arity r) which is 182? (length (delete-dup-atoms eq)) ; 262 !! what's duplicated in eq? (length (keep-duplicate-atoms eq)) (define er (cog-outgoing-set r)) (length (atoms-subtract er eq)) ; 10 (length (atoms-subtract eq er)) ; 91 gram-projective.scm gram-7-junk.rdb is (gram-classify-greedy-discrim csc 0.25 4) ---------- Old API (define pca (make-pseudo-cset-api)) (define psa (add-pair-stars pca)) (define wsv (make-shape-vec-api)) (define wss (add-pair-stars wsv)) (define cac (direct-sum psa wss)) (define csc (add-pair-stars cac)) (csc 'fetch-pairs) Baseline run: Used 80807 sections to create 620435 x-sections Start greedy-agglomeration of 12 words Existing classes=0 singletons=0 done=0 --- To-do=12 ncls=0 sing=0 nredo=0 2021-04-03 19:39:24 -- "e" --- --- To-do=11 ncls=0 sing=1 nredo=0 2021-04-03 19:39:34 -- "j" --- Dist=0.9553 for word "e" -- "j" in 22.73 secs ---------Bingo! Dist=0.9553 for word "e" -- "j" ---------Merged 201883 sections in 1272. secs; 158.73 scts/sec Deleted 2446 sections and 224608 cross-sections tot=228584 l= 121992 lx= 120576 r= 106592 rx= 105397 New code: ---------Merged 201883 sections in 35.73 secs; 5650.6 scts/sec Deleted 2446 sections and 105397 cross-sections Deleted 2446 sections and 105397 cross-sections tot= l= 1416 lx= 0 r= 106592 rx= 105397 all-pairs before=701242 after=593399 ah hah pair A is never a Cross... (define WA (Word "e")) (define WB (Word "j")) (define (bogus a b) (format #t "Its ~A and ~A\n" a b)) (define ptu (add-tuple-math csc bogus)) (define perls (ptu 'right-stars (list WA WB))) (length perls) ; 201883 (define nxc 0) (for-each (lambda (spr) (define asec (first spr)) (if (and (not (null? asec)) (eq? (cog-type asec) 'CrossSection)) (set! nxc (+ 1 nxc)))) perls) nxc ; 120576 Ohh is-singleton fails! (csc 'left-type) ; WordNode (define (is-singleton-sect? sect) (define LLOBJ csc) (eq? (LLOBJ 'left-type) (cog-type (LLOBJ 'left-element sect)))) (any (lambda (spr) (define asec (first spr)) (if (and (not (null? asec)) (eq? (cog-type asec) 'CrossSection) (not (is-singleton-sect? asec))) (format #t "fail at ~A\n" asec) #f)) perls) (define fa (CrossSection (ctv 1 0 1) (WordNode "e") (ShapeLink (WordNode "###LEFT-WALL###") (Connector (WordNode "j") (ConnectorDir "+")) (Connector (WordNode "j") (ConnectorDir "+")) (Connector (WordNode "g") (ConnectorDir "+")) (Connector (VariableNode "$connector-word") (ConnectorDir "+")) (Connector (WordNode "e") (ConnectorDir "+")) (Connector (WordNode "!") (ConnectorDir "+"))))) (csc 'left-element fa) is wrong! wtf!? is that even possible!? .. buggy? the new code merges differently!? Hmm. OK, so cross-sections are working differently.... Why are the x-sections working differently? old code: --- Greedy-checking next 10 items Dist=0.0545 for class "e j" -- "!" in 36.12 secs Dist=0.6606 for class "e j" -- "d" in 33.66 secs ---------Bingo! Dist=0.6606 for class "e j" -- "d" ---------Merged 284538 sections in 295.1 secs; 964.13 scts/sec vs new code: --- Greedy-checking next 10 items Dist= 0.0 for class "e j" -- "!" in 10.96 secs Dist=0.2479 for class "e j" -- "d" in 17.03 secs Dist=0.0609 for class "e j" -- "a" in 10.83 secs Dist= 0.0 for class "e j" -- "###LEFT-WALL###" in 9.620 secs So the new distances are completely different.. wtf!? Dist=0.3132 for class "e j" -- "a" in 46.02 secs ---------Bingo! Dist=0.3132 for class "e j" -- "a" ---------Merged 355088 sections in 454.0 secs; 782.08 scts/sec Dist=0.0953 for class "e j" -- "###LEFT-WALL###" in 51.58 secs Dist=0.6138 for class "e j" -- "b" in 36.79 secs ---------Bingo! Dist=0.6138 for class "e j" -- "b" ---------Merged 383586 sections in 491.6 secs; 780.29 scts/sec Dist=0.5797 for class "e j" -- "f" in 50.06 secs ---------Bingo! Dist=0.5797 for class "e j" -- "f" ---------Merged 409308 sections in 516.3 secs; 792.78 scts/sec Dist=0.6088 for class "e j" -- "h" in 57.82 secs ---------Bingo! Dist=0.6088 for class "e j" -- "h" ---------Merged 430864 sections in 601.2 secs; 716.70 scts/sec Dist=0.6421 for class "e j" -- "c" in 56.41 secs ---------Bingo! Dist=0.6421 for class "e j" -- "c" ---------Merged 449112 sections in 635.7 secs; 706.50 scts/sec Dist=0.3095 for class "e j" -- "g" in 56.84 secs ---------Bingo! Dist=0.3095 for class "e j" -- "g" ---------Merged 454015 sections in 590.7 secs; 768.54 scts/sec Dist=0.1362 for class "e j" -- "i" in 49.31 secs --- Checking the done-list len=0 ---- Remaining count = 1372.9 of 694217.0 for "j" --- To-do=3 ncls=1 sing=0 nredo=9 2021-04-03 21:21:22 -- "!" --- Dist=0.0865 for class "e j" -- "!" in 83.39 secs --- To-do=2 ncls=1 sing=1 nredo=9 2021-04-03 21:22:51 -- "###LEFT-WALL###" --- Dist=0.1064 for class "e j" -- "###LEFT-WALL###" in 66.43 secs Dist= 0.0 for word "!" -- "###LEFT-WALL###" in 25.67 secs --- To-do=1 ncls=1 sing=2 nredo=9 2021-04-03 21:24:27 -- "i" --- Dist=0.1362 for class "e j" -- "i" in 57.25 secs Dist=0.3250 for word "!" -- "i" in 13.69 secs ---------Bingo! Dist=0.3250 for word "!" -- "i" ---------Merged 76771 sections in 148.1 secs; 518.48 scts/sec --- Greedy-checking next 0 items --- Checking the done-list len=9 Dist=0.0001 for class "! i" -- "e" in 8.513 secs Dist=0.0002 for class "! i" -- "j" in 6.067 secs Dist=0.0283 for class "! i" -- "d" in 5.023 secs Dist=0.2085 for class "! i" -- "a" in 6.128 secs Dist=0.0017 for class "! i" -- "b" in 6.049 secs Dist= 0.0 for class "! i" -- "f" in 4.532 secs Dist=0.0008 for class "! i" -- "h" in 5.540 secs Dist= 0.0 for class "! i" -- "c" in 4.527 secs Dist=0.0008 for class "! i" -- "g" in 5.029 secs ---- Remaining count = 9567.1 of 24508. for "i" --- To-do=0 ncls=2 sing=1 nredo=11 2021-04-03 21:29:24 -- "()" --- Finished greedy-agglomeration: 11 words assigned to 2 classes -------------------- start again. (define pca (make-pseudo-cset-api)) (define psa (add-pair-stars pca)) (define wsv (add-shape-vec-api psa)) (define wss (add-pair-stars wsv)) (define cac (direct-sum psa wss)) (define csc (add-pair-stars cac)) (csc 'fetch-pairs) (gram-classify-greedy-discrim csc 0.25 4) Bad method call on pseudo-cset: get-all-pairs (define cls (merge-frac pcos cos-fraction ZIPF WORD-A WORD-B 'WordClass)) 329 clobber os pseudo and on shapes ... via provides.../ all-pairs vs get-all-elts fliter.scm needs a clobber -------------------------- OK all bugs fixed, all code modernized. Once again, connector mrging is next. gram-projective.scm merge-row-pairs takes a, b, rowid sums count and sets it. cset-merge.scm (matching-sections CON-A CON-B) return list of pairs (define monitor-rate (make-rate-monitor)) /home/ubuntu/src/atomspace-rocks/opencog/persist/rocks/RocksIO.cc :570 well 565 remFromSidList remin osatom=ShapeLink - the osatom, from which the xsection to be removed. reminc satom=2Vy3 (CrossSection well foo, delete was n=0 r=0 satom=2Vy3 (CrossSection removeSatom called under lock gaurd mtx_list writeAtom locks _mtx_sid, then unlocks it before storing incoming set. .... the write of incoming set is done under _mtx_list but that is distinct lock. std::recursive_mutex _mtx_list; std::mutex _mtx_sid; error in accessing i@ remFromSidList called from remIncoming called from removeSatom called from ... already covered removeSatom called from removeSatom reccurisvely removeAtom with mtx_list lock held. i@ handlers: removeSatom with mtx_list held getIncomingSet writeAtom() with mtx_list held remIncoming() success remFromSidList G1 from i@P:ListLink in tid=13712 in remFromSidList G1 not found in list i@P:ListLink in tid=13709 so incoming is removed first, then the atom SilentException NotFoundException 593 OK, revised code: removeSatom from removeAtom with mtx_list held writeAtom wries a@ not holding any lock still get an error 11 rocks: 16 18 18 simple 49 49 47 -- 49 non-simple: 8m3 7m32 pg: 8m12 pg-short: 20 20 20 uhh, /home/linas/src/novamente/src/atomspace-git/opencog/persist/sql/multi-driver/SQLUUID.cc:133 get_uuid SQLValues.cc:130 SQLValues.cc:564 SQLAtomStore.cc:124 ./tests/persist/sql/multi-driver/MultiDeleteUTest again: 50x50 for pg pg - 31 31 31 rocks - 24 18 28 20 34 again 50x500 no barrier cog-sim 39 39 cog-sto - 12 13 12 50x50 no barrier rocks 17 27 no has dbg in it. rocks 14 24 11 pg 7 7 7 50x100 pg 14 14 in removeSatom sid=91 in remFromSidList 91 not found in list i@2:ListLink Enter writeAtom In writeAtom new sid=Z1 In writeAtom onging w/sid=Z1 Enter writeAtom Exit writeAtom existing sid=Z1 is the sid being added twice!? Yikes! neear 273 oh its OK in writeAtom add sid=G1 to ist=i@5:ListLink Added sid G1 new sid list for i@5:ListLink is EG1 size_t pos = sidlist.find(sid); while (std::string::npos != pos and 0 < pos) { if (' ' != sidlist[pos-1]) pos = sidlist.find(sid, pos+1); else break } 600 Fixed in latest flurry of rocks pushes terminate called after throwing an instance of 'opencog::InvalidParamException' what(): Atom is already in the TLB, and UUID's don't match! (/home/linas/src/novamente/src/atomspace-git/opencog/atomspaceutils/TLB.cc:129) SQLAtomLoad.cc:202 ... doGetLink ... SQLAtomDelete.cc:106 UUID uuid = check_uuid(h); StorageNode.cc:77 check_uuid(h) if not in tlbuf, doGetLink returns uuid in db, puts into tlbuf.addAtom(link, uuid_ so storeAtm must have ...gotten a new uuid!? test_persist_sql ---------- wtf. /home/ubuntu/src/atomspace-rocks/opencog/persist/rocks/RocksIO.cc:575 remove 5dy3 its a CrossSection it contains 6dy3 which is a ShapeLink but 6dy3 has an empty income set... (cog-rocks-open "rocks:///home/ubuntu/data/expt-16/gram-23-junk.rdb") (cog-rocks-get "i@6dy3:") (cog-rocks-get "i@6dy3:CrossSection") It looks just fine, so wtf... Database corruption! cleanup: Removed sid in removeSatom sid=DNw3 in remincoming ist=i@:Section inset key=i@:Section< (cog-rocks-open "rocks:///home/ubuntu/data/expt-16/gram-24-junk.rdb") (cog-rocks-get "i@DNw3:") Throw to key `C++-EXCEPTION' with args `("cog-rocks-open" "Can't open file: Corruption: Can't access /048536.sst: IO error: while stat a file for size: /home/ubuntu/data/expt-16/gram-24-junk.rdb/048536.sst: No such file or directory\n (/home/ubuntu/src/atomspace-rocks/opencog/persist/rocks/RocksStorage.cc:86)\nFunction args:\n(rocks:///home/ubuntu/data/expt-16/gram-24-junk.rdb)")'. in writeAtom add sid=gRL6 to ist=i@6Lx3:CrossSection Added sid gRL6 to sidlist for i@6Lx3:CrossSection bef=0 aft=1 bef= aft=gRL6 in removeSatom sid=5Lx3 in remincoming ist=i@6Lx3:CrossSection Error: Empty sidlist; can't find sid=5Lx3< inset key=i@6Lx3:CrossSection< (cog-rocks-print storage-node "i@6Lx3:CrossSection") completely empty scheme@(gram-class)> (cog-close storage-node) scheme@(gram-class)> (cog-open storage-node) ice-9/boot-9.scm:1669:16: In procedure raise-exception: Throw to key `C++-EXCEPTION' with args `("cog-open" "Can't open file: IO error: While open directory: /home/ubuntu/data/expt-16/gram-25-junk.rdb: Too many open files (/home/ubuntu/src/atomspace-rocks/opencog/persist/rocks/RocksStorage.cc:86)\nFunction args:\n((RocksStorageNode \"rocks:///home/ubuntu/data/expt-16/gram-25-junk.rdb\")\n)")'. ah hah! ulimit -1 -- 1024 find |wc # 1009 lsof |grep guile | grep gram-25 |wc 42441 lsof -p 30945 |wc 1248 lsof -p 30945 |grep junk |wc 987 find ~/data/expt-16/gram-25-junk.rdb |grep sst |wc 989 So ... closing rocks does not close the open file handles... wtf. CancelAllBackgroundWork(_rfile, true); rocksdb_close() delete_obsolete_files_period_micros = 5ULL * 1000000; max_background_jobs = 4; allow_mmap_reads = true; options.compaction_pri = kMinOverlappingRatio; lsof -p 24949 | grep junk | wc 992 find ~/data/expt-16/gram-26-junk.rdb |grep sst |wc 986 (cog-rocks-print storage-node "i@A6y3:CrossSection") opts.set_max_open_files(max_open_files); int level0_file_num_compaction_trigger = 4; soft_pending_compaction_bytes_limit Wow. vast effing memleak: rocks is using 154 GBytes RAM! https://github.com/facebook/rocksdb/issues/3216 avoid memleak by setting cache_index_and_filter_blocks=true need to turn on block cache. uhh. print-range fails when rocks is corrupting things! wow Seeks are wrong! appendToSidList remFromSidList loadInset fetch_incoming_set sidlist style: 9/12 Test #9: MultiDeleteUTest ................. Passed 17.51 sec 11/12 Test #11: LargeFlatUTest ................... Passed 83.93 sec 12/12 Test #12: LargeZipfUTest ................... Passed 145.84 sec Total Test time (real) = 265.20 sec Total Test time (real) = 270.37 sec Total Test time (real) = 285.15 sec sidkey-style: 9/12 Test #9: MultiDeleteUTest ................. Passed 17.00 sec 11/12 Test #11: LargeFlatUTest ................... Passed 82.52 sec 12/12 Test #12: LargeZipfUTest ................... Passed 110.68 sec Total Test time (real) = 225.92 sec Total Test time (real) = 233.80 sec Total Test time (real) = 226.48 sec So its actually faster. Remove locks: Total Test time (real) = 206.39 sec Total Test time (real) = 207.73 sec Fix leaking iterators: 10fb460556f197efa0effe9b2fe90eae7f892d10 Total Test time (real) = 184.45 sec Total Test time (real) = 186.52 sec -------------------- start again. expt-17 cp -pr fake-corpus-w-wall pair-corpus ~/src/learn/run/2-word-pairs/run-all.sh ~/src/learn/run/3-mst-parsing/run-all-mst.sh (define pca (make-pseudo-cset-api)) (define psa (add-pair-stars pca)) (define wsv (add-shape-vec-api psa)) (define wss (add-pair-stars wsv)) (define cac (direct-sum psa wss)) (define csc (add-pair-stars cac)) (csc 'fetch-pairs) (define btc (batch-transpose csc)) (btc 'mmt-marginals) (print-matrix-summary-report csc) ((make-store csc) 'store-all) ; ... is this needed? Its slow! ------ its not needed. (gram-classify-greedy-discrim csc 0.25 4) starting with shape-naked ... --- To-do=15 ncls=0 sing=0 nredo=0 2021-04-11 22:55:07 -- "e" --- --- To-do=14 ncls=0 sing=1 nredo=0 2021-04-11 22:55:16 -- "WALL" --- Dist=0.0014 for word "e" -- "WALL" in 17.69 secs --- To-do=13 ncls=0 sing=2 nredo=0 2021-04-11 22:55:38 -- "j" --- Dist=0.9878 for word "e" -- "j" in 24.82 secs ---------Bingo! Dist=0.9878 for word "e" -- "j" ---------Merged 174041 sections in 48.00 secs; 3625.9 scts/sec --- Greedy-checking next 12 items Dist=0.6153 for class "e j" -- "d" in 24.99 secs ---------Bingo! Dist=0.6153 for class "e j" -- "d" ---------Merged 208189 sections in 104.0 secs; 2001.8 scts/sec Dist= 0.0 for class "e j d" -- "###LEFT-WALL###" in 27.79 secs Dist=0.0089 for class "e j d" -- "!" in 18.08 secs Dist=0.6402 for class "e j d" -- "a" in 22.23 secs ---------Bingo! Dist=0.6402 for class "e j d" -- "a" ---------Merged 238977 sections in 130.0 secs; 1838.3 scts/sec Dist= 0.0 for class "e j d a" -- "LEFT" in 30.90 secs Dist=0.7550 for class "e j d a" -- "b" in 26.49 secs ---------Bingo! Dist=0.7550 for class "e j d a" -- "b" ---------Merged 270094 sections in 152.0 secs; 1776.9 scts/sec Dist= 0.0 for class "e j d a b" -- "-" in 34.42 secs Dist=0.3393 for class "e j d a b" -- "f" in 25.15 secs ---------Bingo! Dist=0.3393 for class "e j d a b" -- "f" ---------Merged 279128 sections in 152.0 secs; 1836.4 scts/sec Dist=0.2493 for class "e j d a b f" -- "h" in 32.48 secs Dist=0.3591 for class "e j d a b f" -- "c" in 28.42 secs ---------Bingo! Dist=0.3591 for class "e j d a b f" -- "c" ---------Merged 284704 sections in 164.0 secs; 1736.0 scts/sec Dist=0.4848 for class "e j d a b f c" -- "g" in 31.99 secs ---------Bingo! Dist=0.4848 for class "e j d a b f c" -- "g" ---------Merged 288253 sections in 163.0 secs; 1768.4 scts/sec Dist=0.1007 for class "e j d a b f c g" -- "i" in 32.86 secs --- Checking the done-list len=0 ---- Remaining count = 172.63 of 829662.0 for "j" --- To-do=12 ncls=1 sing=1 nredo=2 2021-04-11 23:25:40 -- "d" --- Dist=0.0000 for word "WALL" -- "d" in 9.347 secs --- To-do=11 ncls=1 sing=2 nredo=2 2021-04-11 23:25:50 -- "###LEFT-WALL###" --- Dist= 0.0 for class "e j" -- "###LEFT-WALL###" in 6.495 secs Dist= 0.0 for word "WALL" -- "###LEFT-WALL###" in 11.76 secs Dist= 0.0 for word "d" -- "###LEFT-WALL###" in 5.480 secs --- To-do=10 ncls=1 sing=3 nredo=2 2021-04-11 23:26:17 -- "!" --- Dist= 0.0 for word "WALL" -- "!" in 6.542 secs Dist= 0.0 for word "###LEFT-WALL###" -- "!" in 8.203 secs Dist= 0.0 for word "LEFT" -- "-" in 9.220 secs Maxes out at 42GB RAM usage 6.9GB disk, before compaction 100 sst files 106 open DB files === then close the database (but not guile) 37 open DB files descriptors!! 100 sst files as before 6.8GB disk use Still 42GB RAM use == open the database Still 42GB RAM use 519 MBytes DB disk usage 13 sst files total 46 open file desciptors == close DB again 37 open DB file descs as before - these have been leaked. 13 sst files unchanged 515 MBytes DB disk use unchanged Bug: there is not supposed to be WALL and LEFT ... I guess this is a corpus bug!? A splitter bug? Bug: why is there a "-" word ???? wtf? Wrong corpus used. -------------------- start again. expt-18 get checkout https://github.com/facebook/rocksdb commit bb75092574532c5629c27dcd99fe55f5514af48c (HEAD -> master, origin/master, origin/HEAD) version 6.19 cp -pr fake-corpus-w-wall pair-corpus ~/src/learn/run/2-word-pairs/run-all.sh ~/src/learn/run/3-mst-parsing/run-all-mst.sh cp -pr mpg_parse.rdb shape.rdb (define pca (make-pseudo-cset-api)) (define psa (add-pair-stars pca)) (define wsv (add-shape-vec-api psa)) (define wss (add-pair-stars wsv)) (define cac (direct-sum psa wss)) (define csc (add-pair-stars cac)) (csc 'fetch-pairs) (define btc (batch-transpose csc)) (btc 'mmt-marginals) (print-matrix-summary-report csc) ; xx ((make-store csc) 'store-all) ; ... is this needed? Its slow! ------ its not needed. cp -pr shape.rdb gram-2-junk.rdb (gram-classify-greedy-discrim csc 0.25 4) log: after word-pairs, rocksdb compressed from xxx not measured. down to ls -la fake_pairs.rdb/*sst |wc # 4 du -s fake_pairs.rdb # 148 M ls -la mpg_parse.rdb/*sst | wc # 3 du -s mpg_parse.rdb # 242 M after reopen , compact to 145M ls -la shape.rdb/*sst | wc # 10 lsof -p 18889 | grep sst |wc # 10 du -s shape.rdb # 801 MB ps aux |grep guile # 7.36GB virt 4.85 GB rss (cog-close storage-node) no change to RAM du -s shape.rdb # 719 MB so some decrease ls -la shape.rdb/*sst | wc # 10 no change lsof -p 18889 | grep sst |wc # 3 so its leaking filedescs, still (cog-open storage-node) No change to RAM du -s shape.rdb # 514 MB ls -la shape.rdb/*sst | wc # 8 ls -la shape.rdb/*sst | wc # 11 of which 3 are marked "deleted" stop guile, start guile, stop guile: no change to storage ls -la shape.rdb/*sst | wc # 9 OK, greedy-discrim. (csc 'fetch-pairs) (gc-stats) ;25MB ps aux |grep guile # 6.4GB virt, 3.75 GB RSS (gram-classify-greedy-discrim csc 0.25 4) ps aux |grep guile # 37.8GB virt 35.2GB rss lsof -p 362 | grep sst |wc # 81 ls -la gram-1-junk.rdb/*sst |wc # 81 du -s gram-1-junk.rdb # 5.8 GB (cog-close storage-node) du -s gram-1-junk.rdb # 5.7 GB lsof -p 362 | grep sst |wc # 74 Yow! thats a big leak! (cog-open storage-node) du -s gram-1-junk.rdb # 606 MB - big shrink lsof -p 362 | grep sst |wc # 91 ls -la gram-1-junk.rdb/*sst |wc # 17 options.table_properties_collector_factories.emplace_back(NewCompactOnDeletionCollectorFactory(100, 90, /*deletion_ratio=*/0.5)); DestroyAndReopen(options); rocks issues: #3216 #4112 #8041 it's leaking iterators. Fixed in 10fb460556f197efa0effe9b2fe90eae7f892d10 df: 708513296 before starting gram-classify. lsof -p 25951 | grep sst |wc # 9 ls -la gram-2-junk.rdb/*sst |wc # 9 du -s gram-2-junk.rdb # 514 MB after finishing: du -s gram-2-junk.rdb # 1.01 GB ps aux |grep guile # 8.5 GB virt 5.6GB rss ls -la gram-2-junk.rdb/*sst |wc # 16 lsof -p 25951 | grep sst |wc # 16 after cog-close: lsof -p 25951 | grep sst |wc # zero! ls -la gram-2-junk.rdb/*sst |wc # 16 du -s gram-2-junk.rdb # 927 MB after cog-open: du -s gram-2-junk.rdb # 1.24 GB lsof -p 25951 | grep sst |wc # 17 ls -la gram-2-junk.rdb/*sst |wc # 17 after cog-close: du -s gram-2-junk.rdb # 602 MB df # 708601988 after exiting guile: df # 708601988 -- no change Perf w/o bloom filters: --- To-do=12 ncls=0 sing=0 nredo=0 2021-04-13 12:30:58 -- "e" --- --- To-do=0 ncls=4 sing=4 nredo=8 2021-04-13 13:09:18 -- "()" --- so 38 minutes. w/bloom filters: --- To-do=12 ncls=0 sing=0 nredo=0 2021-04-13 14:15:51 -- "e" --- --- To-do=0 ncls=4 sing=4 nredo=8 2021-04-13 14:54:40 -- "()" --- so 39 minutes. No difference. ---------------------------- Regression: MultiUserUTest first bad commit: b6fb7935a93a99e4f0f5b22c8c15288f2d0c5b2d the erase in TLB::removeAtom(UUID uuid) is needed for multi-user. but it also breaks MultiDelete. atom is not present... because it wasn't stored again, I guess.... not_yet_stored _uuid_map is erased. not_yet_stored might not be needed!? Fixed pull req #2810 ----------------------------- merge-row-pairs - change to two items merge-frac ptu merge-frac make-fuzz vs. make-discrim fixed-frac vs cos-fraction is only diff make-fuzz vs. make-disinfo mpred differs fraction differs Start greedy-agglomeration of 12 words Existing classes=0 singletons=0 done=0 --- To-do=12 ncls=0 sing=0 nredo=0 2021-04-13 14:15:51 -- "e" --- --- To-do=11 ncls=0 sing=1 nredo=0 2021-04-13 14:16:02 -- "j" --- Dist=0.9557 for word "e" -- "j" in 22.46 secs ---------Bingo! Dist=0.9557 for word "e" -- "j" ---------Merged 202021 sections in 56.00 secs; 3607.5 scts/sec --- Greedy-checking next 10 items Dist=0.0550 for class "e j" -- "!" in 30.12 secs Dist=0.6614 for class "e j" -- "d" in 29.77 secs ---------Bingo! Dist=0.6614 for class "e j" -- "d" ---------Merged 284859 sections in 136.0 secs; 2094.6 scts/sec Dist=0.3121 for class "e j d" -- "a" in 45.09 secs ---------Bingo! Dist=0.3121 for class "e j d" -- "a" ---------Merged 355630 sections in 174.0 secs; 2043.9 scts/sec Dist=0.0808 for class "e j d a" -- "###LEFT-WALL###" in 46.91 secs Dist=0.3918 for class "e j d a" -- "b" in 33.96 secs xxxxxxxx ---------Bingo! Dist=0.3918 for class "e j d a" -- "b" ---------Merged 384102 sections in 195.0 secs; 1969.8 scts/sec Dist=0.2467 for class "e j d a b" -- "f" in 51.38 secs Dist=0.2366 for class "e j d a b" -- "h" in 38.39 secs Dist=0.2554 for class "e j d a b" -- "c" in 35.86 secs ---------Bingo! Dist=0.2554 for class "e j d a b" -- "c" ---------Merged 405139 sections in 216.0 secs; 1875.6 scts/sec Dist=0.1044 for class "e j d a b c" -- "g" in 50.52 secs Dist=0.0226 for class "e j d a b c" -- "i" in 38.41 secs --- Checking the done-list len=0 ---- Remaining count = 1375.5 of 694231.0 for "j" --- To-do=10 ncls=1 sing=0 nredo=2 2021-04-13 14:43:47 -- "!" --- Dist=0.0003 for class "e j" -- "!" in 11.42 secs --- To-do=9 ncls=1 sing=1 nredo=2 2021-04-13 14:44:03 -- "d" --- Dist=0.0145 for word "!" -- "d" in 9.435 secs --- To-do=8 ncls=1 sing=2 nredo=2 2021-04-13 14:44:13 -- "a" --- Dist=0.1250 for word "!" -- "a" in 8.522 secs --- To-do=7 ncls=1 sing=3 nredo=2 2021-04-13 14:44:23 -- "###LEFT-WALL###" --- Dist=0.0182 for class "e j" -- "###LEFT-WALL###" in 8.379 secs Dist= 0.0 for word "!" -- "###LEFT-WALL###" in 16.19 secs Dist=0.0000 for word "d" -- "###LEFT-WALL###" in 8.048 secs Dist=0.4637 for word "a" -- "###LEFT-WALL###" in 7.529 secs ---------Bingo! Dist=0.4637 for word "a" -- "###LEFT-WALL###" ---------Merged 64754 sections in 19.00 secs; 3408.1 scts/sec --- Greedy-checking next 6 items Dist=0.0001 for class "a ###LEFT-WALL###" -- "b" in 7.143 secs Dist=0.0139 for class "a ###LEFT-WALL###" -- "f" in 8.318 secs Dist=0.0142 for class "a ###LEFT-WALL###" -- "h" in 8.104 secs Dist=0.0000 for class "a ###LEFT-WALL###" -- "c" in 3.733 secs Dist=0.0213 for class "a ###LEFT-WALL###" -- "g" in 4.576 secs Dist=0.0256 for class "a ###LEFT-WALL###" -- "i" in 5.645 secs --- Checking the done-list len=2 Dist=0.0000 for class "a ###LEFT-WALL###" -- "e" in 4.843 secs Dist=0.0062 for class "a ###LEFT-WALL###" -- "j" in 3.892 secs ---- Remaining count = 182636.0477091671 of 417669.0 for "###LEFT-WALL###" --- To-do=6 ncls=2 sing=2 nredo=4 2021-04-13 14:46:54 -- "b" --- Dist=0.0001 for class "a ###LEFT-WALL###" -- "b" in 9.050 secs Dist=0.0061 for word "!" -- "b" in 8.353 secs --- To-do=5 ncls=2 sing=3 nredo=4 2021-04-13 14:47:13 -- "f" --- Dist=0.0139 for class "a ###LEFT-WALL###" -- "f" in 11.43 secs Dist=0.0198 for class "e j" -- "f" in 5.133 secs Dist=0.0365 for word "!" -- "f" in 13.56 secs Dist=0.2381 for word "d" -- "f" in 4.999 secs Dist=0.0003 for word "b" -- "f" in 5.693 secs --- To-do=4 ncls=2 sing=4 nredo=4 2021-04-13 14:47:56 -- "h" --- Dist=0.0142 for class "a ###LEFT-WALL###" -- "h" in 10.31 secs Dist=0.0174 for class "e j" -- "h" in 5.736 secs Dist=0.0901 for word "!" -- "h" in 14.02 secs Dist=0.2265 for word "d" -- "h" in 4.433 secs Dist=0.0003 for word "b" -- "h" in 4.229 secs Dist=0.9304 for word "f" -- "h" in 9.015 secs ---------Bingo! Dist=0.9304 for word "f" -- "h" ---------Merged 68219 sections in 18.00 secs; 3789.9 scts/sec --- Greedy-checking next 3 items Dist=0.1650 for class "f h" -- "c" in 6.589 secs Dist=0.0211 for class "f h" -- "g" in 4.638 secs Dist=0.0508 for class "f h" -- "i" in 4.224 secs --- Checking the done-list len=4 Dist=0.1188 for class "f h" -- "e" in 3.754 secs Dist=0.0420 for class "f h" -- "j" in 3.911 secs Dist=0.0014 for class "f h" -- "a" in 4.463 secs Dist=0.0663 for class "f h" -- "###LEFT-WALL###" in 5.652 secs ---- Remaining count = 1465.8 of 187522.0 for "h" --- To-do=3 ncls=3 sing=3 nredo=6 2021-04-13 14:50:21 -- "c" --- Dist=0.1650 for class "f h" -- "c" in 9.284 secs xxx unchanged and uncached... Dist=0.0000 for class "a ###LEFT-WALL###" -- "c" in 7.680 secs Dist=0.0000 for word "!" -- "c" in 9.102 secs --- To-do=2 ncls=3 sing=4 nredo=6 2021-04-13 14:50:48 -- "g" --- Dist=0.0211 for class "f h" -- "g" in 7.101 secs Dist=0.0213 for class "a ###LEFT-WALL###" -- "g" in 9.327 secs Dist=0.0511 for word "!" -- "g" in 8.720 secs --- To-do=1 ncls=3 sing=5 nredo=6 2021-04-13 14:51:18 -- "i" --- Dist=0.0508 for class "f h" -- "i" in 7.774 secs Dist=0.0256 for class "a ###LEFT-WALL###" -- "i" in 8.618 secs Dist=0.3223 for word "!" -- "i" in 10.01 secs ---------Merged 76769 sections in 32.00 secs; 2399.0 scts/sec --- Greedy-checking next 0 items --- Checking the done-list len=6 Dist=0.0001 for class "! i" -- "e" in 7.111 secs Dist=0.0002 for class "! i" -- "j" in 4.298 secs Dist=0.3660 for class "! i" -- "a" in 4.482 secs ---------Bingo! Dist=0.3660 for class "! i" -- "a" ---------Merged 80912 sections in 42.00 secs; 1926.5 scts/sec Dist=0.0002 for class "! i a" -- "###LEFT-WALL###" in 8.065 secs Dist=0.0000 for class "! i a" -- "f" in 4.291 secs Dist=0.0943 for class "! i a" -- "h" in 4.338 secs ---- Remaining count = 9652.5 of 24657. for "i" --- To-do=0 ncls=4 sing=4 nredo=8 2021-04-13 14:54:40 -- "()" --- Finished greedy-agglomeration: 16 words assigned to 9 classes TODO: Why are the distances.... different? Is clobber not working? Are counts not being correctly updated? Oh bug, typo in the new code ... Oh, bug: its always starting a new cluser .. why!? Because of bad type compare. Oh otpimzation: distance recomputed when not needed. merge seems strange: Dist=0.9549 for word "e" -- "j" but got 122053 duals for (WordNode "e" ) got 106595 duals for (WordNode "j") unio length=202020 so very little overlap of the basis .. must be small counts!? (format #t "duude ~A m=~A ~A cls=~A\n" single WA WB cls) Current code expt-17 Start greedy-agglomeration of 12 words Existing classes=0 singletons=0 done=0 --- To-do=12 ncls=0 sing=0 nredo=0 2021-04-19 22:59:01 -- "e" --- --- To-do=11 ncls=0 sing=1 nredo=0 2021-04-19 22:59:14 -- "j" --- Dist=0.9557 for word "e" -- "j" in 25.98 secs identical ---------Bingo! Dist=0.9557 for word "e" -- "j" ------ Create: Merged 202021 sections in 114.0 secs; 1772.1 scts/sec nsect=1783 nshape=200238 --- Greedy-checking next 10 items Dist=0.0550 for class "e j" -- "!" in 34.13 secs Dist=0.6614 for class "e j" -- "d" in 32.89 secs ---------Bingo! Dist=0.6614 for class "e j" -- "d" diffferent and less Dist=0.5231 ------ Extend: Merged 104280 sections in 60.00 secs; 1738.0 scts/sec Dist=0.3100 for class "e j" -- "a" in 48.94 secs ---------Bingo! Dist=0.3100 for class "e j" -- "a" diff Dist=0.4161 so greater ------ Extend: Merged 73083 sections in 68.00 secs; 1074.7 scts/sec Dist=0.0900 for class "e j" -- "###LEFT-WALL###" in 56.52 secs Dist=0.6042 for class "e j" -- "b" in 36.73 secs ---------Bingo! Dist=0.6042 for class "e j" -- "b" diff Dist=0.6817 so greater ------ Extend: Merged 40401 sections in 48.00 secs; 841.69 scts/sec Dist=0.5827 for class "e j" -- "f" in 52.08 secs ---------Bingo! Dist=0.5827 for class "e j" -- "f" different Dist=0.4730 so less ------ Extend: Merged 39769 sections in 49.00 secs; 811.61 scts/sec Dist=0.6249 for class "e j" -- "h" in 58.82 secs ---------Bingo! Dist=0.6249 for class "e j" -- "h" diff Dist=0.5428 ------ Extend: Merged 36810 sections in 53.00 secs; 694.53 scts/sec Dist=0.6805 for class "e j" -- "c" in 63.55 secs ---------Bingo! Dist=0.6805 for class "e j" -- "c" diff Dist=0.5947 ------ Extend: Merged 32955 sections in 50.00 secs; 659.10 scts/sec Dist=0.3152 for class "e j" -- "g" in 57.47 secs ---------Bingo! Dist=0.3152 for class "e j" -- "g" diff Dist=0.3768 ------ Extend: Merged 8228 sections in 42.00 secs; 195.90 scts/sec Dist=0.1258 for class "e j" -- "i" in 57.40 secs --- Checking the done-list len=0 ---- Remaining count = 1375.5 of 694231.0 for "j" --- To-do=3 ncls=1 sing=0 nredo=9 2021-04-19 23:25:29 -- "!" --- Dist=0.0787 for class "e j" -- "!" in 71.03 secs --- To-do=2 ncls=1 sing=1 nredo=9 2021-04-19 23:26:44 -- "###LEFT-WALL###" --- Dist=0.0960 for class "e j" -- "###LEFT-WALL###" in 57.67 secs Dist= 0.0 for word "!" -- "###LEFT-WALL###" in 17.98 secs --- To-do=1 ncls=1 sing=2 nredo=9 2021-04-19 23:28:03 -- "i" --- Dist=0.1258 for class "e j" -- "i" in 50.40 secs Dist=0.3223 for word "!" -- "i" in 9.925 secs ---------Bingo! Dist=0.3223 for word "!" -- "i" ------ Create: Merged 76769 sections in 64.00 secs; 1199.5 scts/sec nsect=40979 nshape=35790 --- Greedy-checking next 0 items --- Checking the done-list len=9 Dist=0.0001 for class "! i" -- "e" in 6.509 secs Dist=0.0002 for class "! i" -- "j" in 5.446 secs Dist=0.0283 for class "! i" -- "d" in 4.782 secs Dist=0.2077 for class "! i" -- "a" in 5.683 secs Dist=0.0017 for class "! i" -- "b" in 5.224 secs Dist=0.0000 for class "! i" -- "f" in 5.820 secs Dist=0.0007 for class "! i" -- "h" in 5.168 secs Dist= 0.0 for class "! i" -- "c" in 5.144 secs Dist=0.0008 for class "! i" -- "g" in 4.424 secs ---- Remaining count = 9652.5 of 24657. for "i" --- To-do=0 ncls=2 sing=1 nredo=11 2021-04-19 23:31:13 -- "()" --- Finished greedy-agglomeration: 11 words assigned to 2 classes =============================== Try to repro the old bug so can write unit test for it. Cause I don't remember the bug.. 26 March ae9603ff864c1b94926621e53a603790b94ba8cb c426c23923f2e99aa23c1920c9c279a218b3c19f (define var (Variable "$api-right-star")) (define term (wss 'make-pair (WordNode "d") var)) (define b (Bind (TypedVariable var (Type "EvaluationLink")) term term)) (use-modules (opencog exec)) (define pca (make-pseudo-cset-api)) (define psa (add-pair-stars pca)) (define wsv (make-shape-vec-api)) (define wss (add-pair-stars wsv)) (wsv 'fetch-pairs) (define wsc (add-support-compute wss)) (wss 'left-basis-size) (wsc 'cache-all) PatternLink with body (EvaluationLink (PredicateNode "*-word-shape pair-*") (WordNode "d") (VariableNode "$api-right-star")) (EvaluationLink (PredicateNode "*-word-shape pair-*") (WordNode "d") (EvaluationLink (PredicateNode "*-shape-*") (WordNode "d") (Connector (WordNode "e") (ConnectorDir "-")) (Connector (WordNode "j") (ConnectorDir "+")) (Connector (WordNode "g") (ConnectorDir "+")) (Connector (VariableNode "$connector-word") (ConnectorDir "+")) (Connector (WordNode "!") (ConnectorDir "+")) )) (EvaluationLink (PredicateNode "*-word-shape pair-*") (WordNode "d") (EvaluationLink (PredicateNode "*-shape-*") (WordNode "d") (Connector (VariableNode "$connector-word") (ConnectorDir "-")) (Connector (WordNode "g") (ConnectorDir "+")) (Connector (WordNode "g") (ConnectorDir "+")))) (EvaluationLink (PredicateNode "*-word-shape pair-*") (WordNode "d") (EvaluationLink (PredicateNode "*-shape-*") (WordNode "d") (Connector (WordNode "b") (ConnectorDir "-")) (Connector (VariableNode "$connector-word") (ConnectorDir "-")) (Connector (WordNode "g") (ConnectorDir "-")) (Connector (WordNode "e") (ConnectorDir "+")))) (EvaluationLink (PredicateNode "*-word-shape pair-*") (WordNode "b") (EvaluationLink (PredicateNode "*-shape-*") (WordNode "b") (Connector (WordNode "###LEFT-WALL###") (ConnectorDir "-")) (Connector (VariableNode "$connector-word") (ConnectorDir "-")) (Connector (WordNode "a") (ConnectorDir "+")))) OK unit test is in opencog/atomspace#2812 -------------------------- (make-concatenation "Sorry, method make-cluster not available!")' run-common/cogserver-gram.scm run-common/marginals-mst-shape.scm: gram-optim.scm -------------------------------- (define pca (make-pseudo-cset-api)) (define psa (add-pair-stars pca)) (define wsv (add-shape-vec-api psa)) (define wss (add-pair-stars wsv)) (define cac (direct-sum psa wss)) (define csc (add-pair-stars cac)) (csc 'fetch-pairs) (define gsc (add-cluster-gram csc)) -------------------------- (define pca (make-pseudo-cset-api)) (define csc (add-covering-sections pca)) (csc 'fetch-pairs) (csc 'explode-sections) ; neede both on load and mmt compute (define gsc (add-cluster-gram csc)) (gram-classify-greedy-discrim gsc 0.25 4) (define wrd-lst (cog-get-atoms 'WordNode)) (define w (car wrd-lst)) ((add-support-api csc) 'right-count w) (cover-stars 'right-wildcard (WordNode "a")) Has two different answers, and it depends ... FIXED bug Does clobber clobber the direct suum contents? Yes it does failed to fetch incoming set correctly for the wild cards. How? Because direct sum not called. (define wc (EvaluationLink (PredicateNode "*-Direct Sum Wild (cset⊕cross-section)") (WordNode "a") (AnyNode "right-wild-direct-sum"))) (cog-keys wc) (fetch-incoming-set (PredicateNode "*-Direct Sum Wild (cset⊕cross-section)")) support was never computed on direct sum! Why? (at least, not wildcards ... it has Norm Kye and MMT key (define btc (batch-transpose csc)) (btc 'mmt-marginals) (print-matrix-summary-report csc) -------------------------- (define pca (make-pseudo-cset-api)) (define csc (add-covering-sections pca)) (csc 'fetch-pairs) (csc 'explode-sections) ; neede both on load and mmt compute (define gsc (add-cluster-gram csc)) (gram-classify-greedy-discrim gsc 0.25 4) ----- (accumulate-count LLOBJ ACC PAIR FRAC NOISE) 'get-section SHAPE return the matching section bug: if stopped and restarted, merge-into-cluster fails to give cluster FIXED (cog-report-counts) $3 = ((PredicateNode . 25) (ListLink . 75701) (MemberLink . 12) (AnyNode . 8) (Connector . 28) (ConnectorDir . 2) (ConnectorSeq . 522935) (Section . 251631) (ShapeLink . 511484) (CrossSection . 865232) (VariableNode . 1) (EvaluationLink . 16) (TypeNode . 2) (AnchorNode . 1) (SchemaNode . 1) (RocksStorageNode . 1) (WordNode . 12) (WordClassNode . 2)) So, why aren't tehy all deleted? iWhy are tere more sections than we started with? (define sex (cog-get-atoms 'Section)) (define (is-zero? cnt) (< cnt 1.0e-10)) (define zer (filter (lambda (s) (is-zero? (cog-count s))) sex)) (length zer) ; 81182 where are these from? (define nz (filter (lambda (s) (not (is-zero? (cog-count s)))) sex)) (length nz) ; 170449 (define nc (filter (lambda (s) (not (eq? 'WordClassNode (cog-type (gar s))))) nz)) (length nc) ; 104033 OK, so its not closing up.... Argh. (define pca (make-pseudo-cset-api)) (define csc (add-covering-sections pca)) (csc 'fetch-pairs) (define sex (cog-get-atoms 'Section)) (define (is-zero? cnt) (< cnt 1.0e-10)) (define zer (filter (lambda (s) (is-zero? (cog-count s))) sex)) (length zer) ; 0 so old stuff is ok. cset-merge.scm -- delete this. shape-vec.scm ---------- Also delete ShapeLinks w/o any incoming's Bingo! Dist=0.6614 for class "e j" -- "d" Crap. is-similar? (define (get-cosine wa wb) (pcos 'right-cosine wa wb)) (is-similar? get-cosine CUTOFF WORD-A WORD-B)) (define (get-mi wa wb) (pmi 'mmt-fmi wa wb)) (is-similar? get-mi CUTOFF WORD-A WORD-B)) cosine.scm compute-right-cosine (compute-right-product ROW-A ROW-B) (prod-obj 'right-count (list ROW-A ROW-B) (define prod-obj (add-support-compute (add-tuple-math star-obj * GET-CNT) (define (sum-right-count ITEM) (sum-count (star-obj 'right-stars ITEM))) fold-api.scm seems OK ... make-merger ... 'merge-function called in gram-agglo provides word-or-class as argument to 'merge-function uses that word-or-class in merge-predicate uses word-or-class in tuple object built on pseudo-cset... merge-predicate defined by make-merger I don't get it ... how do we fake out pseudo-cset to work with WordClass's ??? Does it correctly respond to stars and duals requests??? Yes, because add-pair-stars does a pattern match and it works.. changes to make-gram-class-api: -- need to modify right-stars to return either-or for item. -- also left-duals... No ... that's not it... ... need tuple to align correctly ... 'right-stars is right-star-union right-star-union takes tuple of rows... calls get-right-union to get 'right-duals for each row in the tuple. .. this is just a big union set of all columns. calls (get-right-tuple righty ROW-TUPLE) calls 'get-pair for row-col pairs from get-right-union net: returns aligned tuples... cause 'get-pair aligns them for merged connectors, 'get-pair must ... ugh. That won't work either. go back to earlier design, and do a post-merge sweep with multi-crosses producing one unified section. (merge-section LLOBJ ACC DONOR FRAC NOISE MRG-CON) (merge-connectors LLOBJ CLS) (load-atoms-of-type 'Member) (define all-stars (csc 'right-stars (WordClass "e j"))) (cog-link 'Section (Word "j") (ConnectorSeq (Connector (Word "d") (ConnectorDir "-")) (Connector (Word "b") (ConnectorDir "-")) (Connector (Word "e" ) (ConnectorDir "-")) (Connector (Word "b" ) (ConnectorDir "-")) (Connector (Word "j" ) (ConnectorDir "-")) (Connector (Word "!" ) (ConnectorDir "+")))) merge (WordNode "e" into (WordClassNode "e j") in clnclusion: no:yes 146 : 547 in conculsion handled 693 of 1783 for (WordNode "e" in clnclusion: no:yes 266 : 443 in conculsion handled 709 of 1783 for (WordNode "j" so there are 1783 sections? hard to beleive, but OK. Given acc section: * does any ctr contain word to merge? * if so, blow it up to multiple x-sections * for all of the blow-ups that have a matching xsect in the vector perform replacement to make new sect w/ rewrite-conseq uhh delete old sect. Given acc x-section: * If it has zero connectors of the type, then just recreate the original section, and transfer the count. * If it has more than one... then recreate teh donor section, and give it to revise-section which will... build what we want, and will xfer counts to it. * Then we also need to recreate teh cross... OK, on conclusion: (cog-report-counts) $3 = ((PredicateNode . 25) (ListLink . 75701) (MemberLink . 12) (AnyNode . 8) (Connector . 28) (ConnectorDir . 2) (ConnectorSeq . 406773) (Section . 215737) (ShapeLink . 511484) (CrossSection . 765043) (VariableNode . 1) (EvaluationLink . 16) (TypeNode . 2) (AnchorNode . 1) (SchemaNode . 1) (RocksStorageNode . 1) (WordNode . 12) (WordClassNode . 2)) which is still outrageous. so wtf... (define sex (cog-get-atoms 'Section)) (define (is-zero? cnt) (< cnt 1.0e-10)) (define zer (filter (lambda (s) (is-zero? (cog-count s))) sex)) (length zer) ; 211702 Huh? (define top (filter (lambda (s) (< 0 (cog-incoming-size s))) sex)) (length top) ; 0 (for-each cog-delete! zer) ------------ fresh start: (cog-report-counts) $2 = ((PredicateNode . 25) (ListLink . 75701) (AnyNode . 8) (Connector . 24) (ConnectorDir . 2) (ConnectorSeq . 75688) (Section . 80832) (ShapeLink . 511484) (CrossSection . 620774) (VariableNode . 1) (EvaluationLink . 14) (TypeNode . 2) (AnchorNode . 1) (SchemaNode . 1) (RocksStorageNode . 1) (WordNode . 12)) (define disc (make-discrim gsc 0.25 4 4)) (disc 'merge-function (Word "e") (Word "j")) (define xes (filter (lambda (s) (eq? 'CrossSection (cog-type s))) rs)) (LLOBJ 'get-section donor) opencog::SchemeSmob::equalp_misc /home/ubuntu/src/atomspace/opencog/guile/SchemeSmob.cc:162 162 if (**av == **bv) return SCM_BOOL_T; SCM_SMOB_DATA(x) (SCM_SMOB_DATA_1 (x)) SCM_SMOB_DATA_1(x) (SCM_SMOB_DATA_N ((x), 1)) SCM_SMOB_DATA_N(x, n) (SCM_CELL_WORD ((x), (n))) SCM_CELL_WORD(x, n) SCM_GC_CELL_WORD ((x), (n)) SCM_GC_CELL_WORD(x, n) (SCM_UNPACK (SCM_GC_CELL_OBJECT ((x), (n)))) #define SCM_GC_CELL_OBJECT(x, n) (((SCM *)SCM2PTR (x)) [n]) SCM_SMOB_PREDICATE(tag, obj) SCM_HAS_TYP16 (obj, tag) SCM_HAS_TYP16(x, tag) (SCM_HAS_HEAP_TYPE (x, SCM_TYP16, tag)) SCM_TYP16(x) (0xffff & SCM_CELL_TYPE (x)) SCM_HAS_HEAP_TYPE(x, type, tag) (SCM_NIMP (x) && type (x) == (tag)) SCM_CELL_TYPE(x) SCM_CELL_WORD_0 (x) print SchemeSmob::cog_misc_tag $4 = 5751 print *a $2 = {scm_unused_field = 119 'w'} its 0x77 print *(long*) a $6 = 71287 (gdb) print *(long*) b $7 = 71287 print 0xffff & 71287 $8 = 5751 OK, great! so both a & b should be smobs... (gdb) print (scm_t_cell) a $6 = {word_0 = 0x5555c143f410, word_1 = 0x5555555bbd80} (gdb) print (scm_t_cell) b Invalid cast. (gdb) print *(SCM*) 0x5555a8374de8 $10 = (SCM) 0x5555accda9b0 (gdb) print *(ValuePtr*) 0x5555555bbd80 $11 = std::shared_ptr (use count 7, weak count -1) = { get() = 0x0} (gdb) print *(ValuePtr*) 0x5555a8374de8 $12 = print /x *(long *)(b+8) $13 = 0x5555accdbab0 print *(ValuePtr*) 0x5555accdbab0 std::shared_ptr (use count 2, weak count 3) = { get() = 0x555555e793c0} print /x *(long *)(a+8) $15 = 0x5555d0d96930 (gdb) print *(ValuePtr*) 0x5555d0d96930 $16 = std::shared_ptr (empty) = {get() = 0x0} Ohhhhhh OK. -------------------------- (define disc (make-discrim gsc 0.25 4 4)) (disc 'merge-function (Word "e") (Word "j")) (define sex (cog-get-atoms 'Section)) (define (is-zero? cnt) (< cnt 1.0e-10)) (define zer (filter (lambda (s) (is-zero? (cog-count s))) sex)) (length zer) (use-modules (ice-9 local-eval)) (use-modules (ice-9 readline)) (define (break env) (display "yo duude> ") (let ((input (read))) (unless (or (eq? '. input) (eof-object? input)) (catch #t (lambda () (local-eval input env)) (lambda (key . args) (format #t "Oh no, Mr. Bill: ~A ~A\n" key args) *unspecified*)) (break env)))) (define (break env) (let ((input (readline "yo duude> "))) (unless (or (eq? '. input) (eof-object? input)) (catch #t (lambda () (define sexpr (call-with-input-string input read)) (format #t "~A\n" (local-eval sexpr env))) (lambda (key . args) (format #t "Oh no, Mr. Bill: ~A ~A\n" key args) *unspecified*)) (break env)))) (define (do-stuff) (define x 42) (format #t "start\n") (define env (the-environment)) (break env) (format #t "end ~A\n" x)) (define (bogo) (define y 56) (do-stuff)) (format #t "it is ~A\n" x) (let* ((rs (LLOBJ 'right-stars ROW)) (goo (filter (lambda (s) (cog-atom? s)) rs))) (format #t (length rs) (length goo) ) (define zer (filter (lambda (s) (is-zero? (cog-count s))) sex)) -------------------------- (define disc (make-discrim gsc 0.25 4 4)) (disc 'merge-function (Word "e") (Word "j")) (define es (gsc 'right-stars (Word "e"))) (length es) ;891 (define (is-zero? cnt) (< cnt 1.0e-10)) (define (zer lst) (length (filter (lambda (s) (is-zero? (cog-count s))) lst))) (zer es) (define zs 0) (define zx 0) (define (foo-sec SEC) (define xes (gsc 'get-cross-sections SEC)) (for-each (lambda (xst) (when (is-zero? (gsc 'get-count xst)) (set! zx (+ 1 zx)))) xes)) (define (foo-xes XST) (define sct (gsc 'get-section XST)) (when (is-zero? (gsc 'get-count sct)) (set! zs (+ zs 1)))) (define (fooli lst) (for-each (lambda (ITEM) (cond ((eq? 'Section (cog-type ITEM)) (foo-sec ITEM)) ((eq? 'CrossSection (cog-type ITEM)) (foo-xes ITEM)))) lst)) (define xd 0) (define (bar-xes XST) (define sct (gsc 'get-section XST)) (when (and (cog-atom? sct) (is-zero? (gsc 'get-count sct))) (set! xd (+ xd 1)) (cog-delete! sct))) Argh, above reveals that 'get-section is creting empty sections. (define sex (cog-get-atoms 'Section)) (define xes (cog-get-atoms 'CrossSection)) .. there are no zero xes! (and there are fewer xes than before) .. only sex has zeros and there re more sex's so we must be creating them. (define (nonsex lst) (filter (lambda (XST) (is-zero? (gsc 'get-count (gsc 'get-section XST)))) lst)) (define zsex (nonsex xes)) (length zsex) ; 541318 Wow! these are xsections whose matching sec has no count! How is that possible? -------------------------- Before merge: (ConnectorSeq . 75688) (Section . 80832) (ShapeLink . 511484) (CrossSection . 620774) (define disc (make-discrim gsc 0.25 4 4)) (disc 'merge-function (Word "e") (Word "j")) After merge: (ConnectorSeq . 188592) (Section . 121758) (ShapeLink . 511484) (CrossSection . 596375) wtf...add-covering-sections donor-connector-set -- donor crosses that donate to sections tbd same but sections that donate to crosses need also shapes revision. This won't work: ; Create a list of all CrossSections, obtainable from Sections in ; the all-stars vector, that donated WRD to the cluster. These ; CrossSections (obviously coresponding to Sections in the stars) ; have WRD as the head of the Shape. (define donor-sex (append-map (lambda (ITEM) (if (eq? 'Section (cog-type ITEM)) (let ((donor (cog-link 'Section WRD (gdr ITEM)))) (if (nil? donor) '() (LLOBJ 'get-cross-sections donor))) '())) all-stars)) when merging e/j, should fractionate the e so start-section should ... (Cross k,l,m (Shape (Class ej) FRAC)) (Cross k,l,m (Shape (Word e) 1-FRAC)) LLOBJ 'get-cross-sections ... which is what the long revise-section does copy to done with start-cluster todo merge-into-cluster DONE (merge-crosses LLOBJ CLS PAIR-A frac-to-merge NOISE)))) ej -> 5 sections (3 + 2 more ... + ??? con merge?? e-> 5 or 3 ? e-klm as before, plus: ab(ej) (ej)gh abj(!) jgh(!) so .. 3 (e, abc) + (j, abc) -> ([ej], abc) (e, dgh) + (j, dgh) -> ([ej], dgh) (e, klm) + none -> frac([ej], klm) + (1-frac)(e, klm) none + (j, abe) -> frac([ej], abx) + (1-frac)(j, aby) none + (j, egh) -> frac([ej], zgh) + (1-frac)(j, wgh) (define xes-b-ej-avc #f) (define xes-k-ej-vlm #f) (define xes-k-e-vlm #f) ; The cross-sections on e should be: ; (1-p) * [e, ] ; (1-p) * [e, ] merge-cross is creating this: [e, <{ej}, vgh>] [e, <{ej}, abv>] from the donor (j, egh) and (j, abe) that is from ({ej}, egh) and ({ej}, abe) ... which is ok for the basic case, but we want the germ to be ej too. so if start; is germ==e then ej if merge-into-cluster then ... ? uhh, no should only be reshaping the mreged guy, who already has the correct counts cause accum-count set them up. in reshape-mere if MRG is x-sect then take the xsect turn it into sect, then take that sect and replace germ by CLS. get-section or make-section? reshape-merge ({ej}, abe) -> [e, <{ej}, abv>] should be zero but p(1-p) ({ej}, egh) -> [e, <{ej}, vgh>] should be zero (j, abe) -> [e, ] OK (1-p) xes-e-j-abv but (1-p)(1-p) (j, egh) -> [e, ] OK (1-p) xes-e-j-vgh (j, abe) has count reduced during section merge... then handed to the recrosses... sent in (j, abe) -> [e, ] -> recross [e, <{ej}, abv>] xdonor already has (1-p)=15.75 xmr is [e, <{ej}, abv>] with zero XST is [e, ] wth 15.75 (acting as donor to xmr) if connector in class, then don't recross? why? Because that connector already showed in cross, and was handled already. Next: ({ej}, abe) should be zero ({ej}, egh) should be zero .. zero but undeleted ... ({ej}, abc) ({ej}, dgh) ({ej}, klm) How? Easiest is to ({ej}, abe) -> [e, <{ej}, abv>] then regerm, then unexplode... distinct-type a ej vbc OK from ({ej}, abc) 1.0 checked b ej avc c ej abv d ej vgh OK from ({ej}, dgh) 1.0 g ej dvh h ej dgv k ej vlm OK from ({ej}, klm) p checked l ej kvm m ej klv k e vlm OK from (e, klm) 1-p checked l e kvm m e klv a j vbe OK from (j, abe) 1-p checked b j ave e j abv e j vgh OK from (j, egh) 1-p checked g j evh h j egh p * ({ej}, ab{ej}) missing created... p * ({ej}, {ej}gh) ej ej abv ej ej vgh (ej, abc) (ej, klm) (ej, dgh) (ej, ej gh) (ej, ab ej) ---------- failed to kill (recrreated ({ej}, abe) ({ej}, egh) recreated during merge... Hang on. first merge should have created crosses picking up f ...!? (f, abe) -> [e, ] (f, ab{ej}) -> [{ej}, ] where'd this go?? (f, {ej}gh) if point is in class... (test-approximate (* frac cnt-f-abe) (cog-count sec-f-abej) epsilon) (test-approximate (* frac cnt-f-egh) (cog-count sec-f-ejgh) epsilon) xes-ej-ej-abv [g, ] ... from (j, egh) ... where'd it go? already whacked?? no... its just not flat ... -------------- wild: (f, {ej}gh) 6.25 [a, ] too high, from (f, abe) 23.25 ------ balance-recrosses fails to flatten... flatten-resects called from merge-resects balance-recrosses .. given donor section, loop over crosses, Hang on .... ({ej}, abe) is created and not eliminated. merge-recrosses handles the flat case .. how!? never mind ----------- Oh... part 1 merge: (f, abe) -> [e, ] -> [{ej}, ] -> (f, ab{ej}) 7.75 7.75 = (* frac cnt-f-abe) tested Then in linear part of second merge, so ({ej}, ab{ej}) 5.25 + (f, ab{ej}) 7.75 -> ({ej}, ab{ej}) 13.0 5.25 = (* frac cnt-j-abe) 13.0 = (* frac (+ cnt-f-abe cnt-j-abe)) = 13.0 also tested.... So now start doing the shapes... We find ({ej}, abe) where is this from? came from flatten-section? MRG came from reshape-merge called with ({ej}, abe) 5.8125 linear (f,abe) merged to zero ({ej}, abe) !!! This was done near line 526 of gram-projective.scm problem: this is the left-over (f,abe) from the part-1 merge it shouldn't be going in ... get-left-union (star-obj 'left-duals row) get-pair OK. So given ({ej}, abe) it has donor (f,abe) as both passed to reshape-merge what should we do with this? 1) all of count of (f,abe) should go to ({ej}, ab{ej}) 2) all of count of ({ej}, abe) should go to ({ej}, ab{ej}) which means basically all of the original (f,abe) gets reassigned to there. So line 243 is wrong .... Again: coming in, have, after the second linear merge, that ({ej}, ab{ej}) = 13.0 = (* frac (+ cnt-f-abe cnt-j-abe)) so what to do with the new ({ej}, abe) ? old dogma says set it to zero. But we need it to be non-zero because of initial cluster creation. So now, just add it. which means ({ej}, ab{ej}) = 13.0 + ({ej}, abe) = 13.0 + p*(1-p) cnt-f-abe = (* frac (+ cnt-f-abe cnt-j-abe)) + (* frac (1-frac) cnt-f-abe) = yuck. 26.5625 doc chkok [{ej}, ] has 7.75 on it ... came from (f, ab{ej}) in the first pass. TEST F1 ... --- a/scm/shape-project.scm +++ b/scm/shape-project.scm @@ -322,9 +322,11 @@ XXX describe me. (define donor-type (cog-type DONOR)) (when (equal? 'Section donor-type) - (if (flatten-section? LLOBJ GLS MRG) - (rebalance-count LLOBJ DONOR (LLOBJ 'get-count DONOR)) - (merge-recrosses LLOBJ GLS DONOR FRAC NOISE))) + (when (not (flatten-section? LLOBJ GLS MRG)) + (merge-recrosses LLOBJ GLS DONOR FRAC NOISE) + (rebalance-count LLOBJ MRG (LLOBJ 'get-count MRG))) + (rebalance-count LLOBJ DONOR (LLOBJ 'get-count DONOR)) + ) (when (equal? 'CrossSection donor-type) (merge-resects LLOBJ GLS W MRG DONOR)) (* frac1 (+ cnt-j-abe cnt-f-abe)) = 13.0 (* frac2 (- 1 frac1) cnt-f-abe) = 8.1375 7.75 = (* frac1 cnt-f-abe) after first merge, ({ej}, ab{ej}) has cnt = 5.25 and (f, ab{ej}) has cnt = 7.75 so total of 13.0 on the linear. ------------------- [e, ] [e, ] [e, ] [e, ] unwanted: .. esp since j was creator! [e, <{ej}, v{ej}h>] [e, <{ej}, {ej}vh>] [e, <{ej}, {ej}bv>] [e, <{ej}, vb{ej}>] merging e first, then j, [{ej}, ] stripped to zero as expected... [e, ] [e, <{ej}, v{ej}h>] < ---- this [e, ] [e, ] [e, merge-resects xmr=[{ej}, ] resect= (j, e{ej}h) germ=j XDON=[e, ] branch to flatten flatten-resects mgs = ({ej}, e{ej}h) flatten-section? LLOBJ GLS MRG j ebe eeh -------------------------- [{ej}, <{ej}, {ej}bv>] [{ej}, <{ej}, vb{ej}>] [{ej}, <{ej}, v{ej}h>] [{ej}, <{ej}, {ej}vh>] [{ej}, ] <-- No unwanted. .. created in linear merge [{ej}, ] [{ej}, ] [{ej}, ] (f, {ej}b{ej}) from [{ej}, ] no recross (accumulate-count LLOBJ flat MRG FRAC NOISE) 335 not in merge-recrosses not in flatten-resects [e, ] [e, ] [e, ] [e, ] \basewidth=0.5em \small For spacing, setting `basewidth=0.45em` will reduce the inter-letter spacing from a ridiculous `0.6em` to a more compact width, all while preserving the fixed width (whereas `columns=fullflexible` wrecks the spacing). -------------------------- . ~/run/expt-18/config/0*sh . ~/run/expt-18/config/4*sh cd data rm -r gram-25-junk.rdb cp -pr shape.rdb gram-25-junk.rdb guile -l /home/ubuntu/src/learn/run-common/cogserver.scm (use-modules (srfi srfi-1)) (define pca (make-pseudo-cset-api)) (define csc (add-covering-sections pca)) (csc 'fetch-pairs) (csc 'explode-sections) (define gsc (add-cluster-gram csc)) (gram-classify-greedy-discrim gsc 0.25 4) -------------------------- Before merge: (ConnectorSeq . 75688) (Section . 80832) (ShapeLink . 511484) (CrossSection . 620774) (define disc (make-discrim gsc 0.25 4 4)) (disc 'merge-function (Word "e") (Word "j")) ----- Create: Merged 404042 sections in 434.0 secs; 930.97 scts/sec Revised shapes Whoops. Way too slow. Ouch. After merge: (ConnectorSeq . 310032) (Section . 51904) (ShapeLink . 793814) (CrossSection . 729722) (for-each cog-extract! (cog-get-atoms 'ShapeLink)) (for-each cog-extract! (cog-get-atoms 'ConnectorSeq)) (ConnectorSeq . 119999) (Section . 51904) (ShapeLink . 630318) (CrossSection . 729722) (for-each cog-extract! (cog-get-atoms 'CrossSection)) (for-each cog-extract! (cog-get-atoms 'ShapeLink)) (csc 'clobber) (csc 'explode-sections) (Section . 51904) (ShapeLink . 339137) (CrossSection . 410724) Hmm why is that different? ??? Well, some of the sections were not written out... (csc 'fetch-pairs) -------------------------- Before merge: (ConnectorSeq . 75688) (Section . 80832) (ShapeLink . 511484) (CrossSection . 620774) (define gsc (add-cluster-gram csc)) (gram-classify-greedy-discrim gsc 0.25 4) After: (ConnectorSeq . 81800) (Section . 6711) (ShapeLink . 173569) (CrossSection . 176020) (check-sections csc epsilon) (check-crosses csc epsilon) One unbalanced section, in total. (c, cba!)21.7514 vs [a, ]16.9337 one undeleted cross-section: [!, ] (define xes (cog-get-atoms 'CrossSection)) (csc 'explode-sections) after: (ConnectorSeq . 81800) (Section . 6711) (ShapeLink . 205930) (CrossSection . 211762) Sooo ... merge made a lot of crosses disappear... (define nxes (cog-get-atoms 'CrossSection)) (define old (make-aset-predicate xes)) (define news (atoms-subtract nxes xes)) wtf explode created an unbalaced cross! (define bad (Section (WordClassNode "e j") (ConnectorSeq (Connector (WordClassNode "e j") (ConnectorDir "-")) (Connector (WordNode "g") (ConnectorDir "+")) (Connector (WordNode "g") (ConnectorDir "+"))))) was everything stored? Yes. (define gtv (make-afunc-cache cog-count)) (for-each gtv (cog-get-atoms 'Section)) (load-atoms-of-type 'Section) (for-each (lambda (atm) (define diff (- (gtv atm) (cog-count atm))) (if (< 1.0e-6 (abs diff)) (format #t "fail at ~A ~A" (gtv atm) atm))) (cog-get-atoms 'Section)) Are there CrossSections in storage? (cog-rocks-stats storage-node) cog-rocks-load-atomspace (load-atomspace) (ConnectorSeq . 75688) (Section . 80832) (ShapeLink . 511484) huh ... but no cross-sections! (for-each cog-delete! (cog-get-atoms 'ShapeLink)) who stored these? -------------------------- try again . ~/run/expt-18/config/0*sh . ~/run/expt-18/config/4*sh cd data rm -r gram-25-junk.rdb; cp -pr shape.rdb gram-25-junk.rdb guile -l /home/ubuntu/src/learn/run-common/cogserver.scm (use-modules (srfi srfi-1)) (define pca (make-pseudo-cset-api)) (define csc (add-covering-sections pca)) (csc 'fetch-pairs) (csc 'explode-sections) (define gsc (add-cluster-gram csc)) (gram-classify-greedy-discrim gsc 0.25 4) -------------------------- Before merge: (ConnectorSeq . 75688) (Section . 80832) (ShapeLink . 511484) (CrossSection . 620774) (define gsc (add-cluster-gram csc)) (define disc (make-discrim gsc 0.25 4 4)) (disc 'merge-function (Word "e") (Word "j")) ------ Create: Merged 202021 sections in 195.0 secs; 1036.0 scts/sec ------ Create: Revised 202021 shapes in 515.0 secs; 392.27 scts/sec ------ Create: cleanup 1 in 281.0 secs; 0.0036 ops/sec (ConnectorSeq . 119999) (Section . 51904) (ShapeLink . 630318) (CrossSection . 729722) (check-sections csc 1.0e-6) Same failing ... (check-sections csc 1.0e-6) (define (check-balance LLOBJ) (define epsilon 1.0e-6) (filter (lambda (sect) (define scnt (cog-count sect)) (any (lambda (cross) (define diff (- scnt (cog-count cross))) (< epsilon (abs diff))) (LLOBJ 'get-cross-sections sect))) (cog-get-atoms 'Section)) ) (define bad (check-balance csc)) (length bad) $6 = 74 now 42 ... now zero! Yayyy! ------------------------------ (filter-type (WordClass "e j") 'CrossSection) [{ej}, <{ej}, abv>] [{ej}, ] <---- unwanted [{ej}, ] [{ej}, <{ej}, vgh>] [{ej}, <{ej}, klv>] [{ej}, ] -> (f, kl{ej}) shoudld have been merged. none + (f, kl{ej}) -> p * ({ej}, kl{ej}) + (1-p) * (f, kl{ej}) connector-merge-tricon.scm 385.2375 388.0 (f, kl{ej}) <- unwanted (a, kl{ej}) (define sec-a-kle #f) (define sec-f-kle #f) (define sec-a-klv #f) (define sec-f-klv #f) ------------------------- try again: ------ Create: Merged 202021 sections in 210.0 secs; 962.00 scts/sec ------ Create: Revised 202021 shapes in 559.0 secs; 361.40 scts/sec ------ Create: cleanup 1 in 312.0 secs; 0.0032 ops/sec (cog-report-counts) -- no change... (define bad (check-balance csc)) (length bad) $3 = 74 ... no change. So the test fix fixed nothing here... sec-ej-abv ({ej}, g{ej}i) -- 5 [g, <{ej}, v{ej}i>] -- 2 [{ej}, <{ej}, gvi>] -- 2 [i, <{ej}, g{ej}v>] -- 2 (Section (WordNode "e") (ConnectorSeq (Connector (WordNode "g") (ConnectorDir "-")) (Connector (WordNode "e") (ConnectorDir "+")) (Connector (WordNode "i") (ConnectorDir "+")))) sec-j-gji 2 sec-j-gei 1 sec-e-gji 2 sec-e-gei 0 start with donor x-cnt= 2.0 0.0 2.0 for SEC=(Section (ctv 1 0 0) sec-e-gji This has zero-count... so was fully merged before flat - 0 after flat 4 and no xes start with donor x-cnt= 2.0 0.0 2.0 for SEC=(Section (ctv 1 0 0) sec-j-gji before flat 4 after flat - 4 and no xes crosser sec-j-gei reg has count of 4.... .. pre-no xes, post xes=3 so this is less than reg... crosser sec-j-gji .. pre has xes=3, post has 0 for reg=4 start with donor x-cnt= 1.0 0.0 1.0 for SEC=(Section (ctv 1 0 0) sec-j-gei now have xes, but of zero count, before = 4 after =5 and all xes have 5 crosser sec-e-gji .. pre has xes=5 post has 2 for reg=5 so count is effed later. j abj e abj j abe done e abe e abj sec-ej-abv sec-ej-vgh xes-a-e-vbe from (e, abe) (csc 'get-cross-sections (Section (WordClassNode "e j") (ConnectorSeq (Connector (WordNode "g") (ConnectorDir "-")) (Connector (WordClassNode "e j") (ConnectorDir "+")) (Connector (WordNode "i") (ConnectorDir "+")))) (CrossSection (WordNode "g") (ShapeLink (WordClassNode "e j") (Connector (VariableNode "$connector-word") (ConnectorDir "-")) (Connector (WordClassNode "e j") (ConnectorDir "+")) (Connector (WordNode "i") (ConnectorDir "+")))) (CrossSection (WordClassNode "e j") (ShapeLink (WordClassNode "e j") (Connector (WordNode "g") (ConnectorDir "-")) (Connector (VariableNode "$connector-word") (ConnectorDir "+")) (Connector (WordNode "i") (ConnectorDir "+")))) -------------------- After delete fix: (ConnectorSeq . 119999) (Section . 51904) (ShapeLink . 300931) (CrossSection . 357720) half the cross-sections disappear after count fix: (ConnectorSeq . 119999) (Section . 51904) (ShapeLink . 301129) (CrossSection . 358315) OK, so this is the final fix (define bad (check-balance csc)) (length bad) $3 = 0 provided by commit 5e1d7dfb94867f22642d7cdf0621a833bb96092e (define (find-zero) (define epsilon 1.0e-6) (filter (lambda (sect) (< (cog-count sect) epsilon)) (cog-get-atoms 'Section))) (define gsc (add-cluster-gram csc)) (define sta (add-pair-stars gsc)) (define disc (make-discrim sta 0.25 4 4)) (disc 'merge-function (Word "e") (Word "j")) ----------------------- guile -l ~/src/learn/run-common/cogserver-gram.scm (define psa star-obj) (gram-classify-greedy-disinfo psa 3.0 4) (define gram-obj (add-cluster-gram cset-obj)) (define psa (add-pair-stars gram-obj)) (gram-classify-greedy-disinfo psa 8.0 4) total fail. wtf. Oh, no its fine. just not printing. (define pca (make-pseudo-cset-api)) (define psa (add-pair-stars pca)) (psa 'fetch-pairs) (define gram-obj (add-cluster-gram pca)) (define gsa (add-pair-stars gram-obj)) (gram-classify-greedy-disinfo gsa 8.0 4) (gram-classify greedy-over-words (make-disinfo STARS MI ZIPF MIN-OBS)) (greedy-grow MERGER CLS-LST singletons done-list todo-words) (assign-word-to-class MERGER wrd TRUE-CLS-LST) (MERGER 'merge-predicate cls WRD) (is-similar? get-mi CUTOFF WORD-A WORD-B)) Dist=0.2484 for word "e" -- "j" in 0.066 secs Dist=0.5501 for word "d" -- "a" in 0.785 secs Dist=0.3725 for word "e" -- "b" in 0.056 secs Dist=0.3929 for word "j" -- "b" in 0.061 secs Dist=0.8141 for word "d" -- "f" in 0.100 secs Dist=0.5596 for word "a" -- "f" in 0.712 secs Dist=0.8126 for word "d" -- "h" in 0.127 secs Dist=0.6048 for word "a" -- "h" in 0.709 secs Dist=1.2800 for word "f" -- "h" in 0.060 secs Dist=0.7817 for word "d" -- "c" in 0.083 secs Dist=0.5544 for word "a" -- "c" in 0.740 secs Dist=1.3032 for word "f" -- "c" in 0.032 secs Dist=1.2824 for word "h" -- "c" in 0.051 secs Dist=0.3896 for word "e" -- "g" in 0.070 secs Dist=0.2684 for word "j" -- "g" in 0.062 secs Dist=0.8982 for word "b" -- "g" in 0.050 secs Dist=0.0819 for word "e" -- "i" in 0.080 secs Dist=0.0185 for word "j" -- "i" in 0.097 secs Dist=1.4862 for word "b" -- "i" in 0.103 secs Dist=3.2143 for word "g" -- "i" in 0.082 secs ========================================================== /home2/linas/lxc-local-containers/learn-en-guile-2.2/rootfs/home/ubuntu/run/alpha-guten-tranche-1/split-books/ du -s text 49549728 text (monitor-parse-rate "yoo") rate to disk: done=236300 rate=5.251 per sec gc-stats gc-time-taken . 8679181887835) == 8679 seconds = 2.4 hours out of 43 hours cpu (heap-total-allocated . 798349198464) = 800 GB gc-times . 62637) -> 13MB per gc and 140 mSec/gc total 2591 minutes cpu time -> 0.66 cpu-secs sentence. Ouch. echo "." | nc -N localhost 17005 Running to disk -- took real 1372m28.541s user 5m24.723s sys 2m34.024s vs. 4816:58 cpu time. Database contents: Next aid: 19665661 Atoms/Links/Nodes a@: 19665660 l@: 19560096 n@: 105556 Keys/Incoming/Hash k@: 9885609 i@: 39112973 h@: 0 du -s * 1631048 en_pairs.rdb 1487000 en_pairs.rdb 1441908 en_pairs.rdb That works out to ... 75 bytes per atom! Whoa! Above is for 426556 sentences in tranche-1 ------------------ new code segfaults... damn. done: 1955 split: 1 left: 1071 Throw to key `C++-EXCEPTION' with args `("dflt-fetch-incoming-set" "Syntax error at line 0 Unexpected text: >>A\")))<<\nFunction args:\n((LgLinkNode \"ANY\")\n)")'. (use-modules (opencog) (opencog logger)) (use-modules (opencog persist)) (use-modules (opencog matrix)) (use-modules (opencog nlp) (opencog nlp learn)) (define sns (getenv "STORAGE_NODE")) (use-modules (opencog persist-rocks)) (define storage-node (eval-string sns)) (cog-open storage-node) (fetch-all-words) (define ala (make-any-link-api)) (define asa (add-pair-stars ala)) (asa 'fetch-pairs) opencog/nlp/learn/batch-word-pair.scm: (fetch-any-pairs) (fetch-incoming-set any-pair-pred) (define any-pair-pred (LgLinkNode "ANY")) (fetch-incoming-set any-pair-pred) (use-modules (opencog extension)) (opencog-extension dflt-fetch-incoming-set ((LgLinkNode "ANY"))) Wrong type to apply: (LgLinkNode "ANY") (dflt-fetch-incoming-set (LgLinkNode "ANY")) (WordNode "Zek.\\") (ListLink (WordNode "Zek.\\")(WordNode "A")) (ListLink (WordNode "Zek.\";\"")(WordNode "A")) optimization.... for string so strem ... get_next_expr Program terminated with signal SIGSEGV, Segmentation fault. --Type for more, q to quit, c to continue without paging-- #0 0x00007f2f6b8515ae in dictionary_all_categories (dict=0x7f2ec8002f90) at ../../link-grammar/tokenize/tokenize.c:3003 3003 dn[dict->num_categories-1].right = NULL; dict->num_categories is 0 ANY-PUNCT orig_sentence = 0x7f2bcf95f9b0 "regex: (?<=\\n{3})[\\* ]{0,2}\\=([^=]+?)\\=", Oh foo -- fixed long ago in #1206 ---------------------- 161516 r2-mpg-trim-10-2-1.rdb 95060 r2-mpg-trim-20-4-3.rdb 47944 r2-mpg-trim-40-8-5.rdb opencog/matrix/direct-sum.scm:261:36: In procedure left-wildcard: Wrong type to apply: #f duude djr=#t RA=(ConnectorSeq (Connector (WordNode "\"") (ConnectorDir "+")) (Connector (WordNode "captain") (ConnectorDir "+")) (Connector (WordNode ".") (ConnectorDir "+"))) init-a-base duuude djlrt=#f #t #t duuude ABR= ConnectorSeq ShapeLink ======================================================= (define rs (pmi 'right-stars (Word "him"))) (length rs) ; 834 (define cs (filter (lambda (atom) (equal? (cog-type atom) 'CrossSection)) (pmi 'right-stars (Word "him")))) (length cs) ; 430 for: impossible- & him+ do: to- & him+; upon: came- & him+ at: gazing- & him+ (cog-incoming-set (ShapeLink (WordNode "at") (Connector (WordNode "gazing") (ConnectorDir "-")) (Connector (VariableNode "$connector-word") (ConnectorDir "+")))) him, the at: (gazing- & him+) or (gazing- & the+) upon: (came- & him+) or (came- & a+) do: to- & (him+ or her+ or it anything in a something them what ? so with as the that this . , ) for: impossible- & (him+ or me+ or any+ or them+) (define sc (filter (lambda (atom) (equal? (cog-type atom) 'Section)) (pmi 'right-stars (Word "him")))) (length sc) ; 404 (define sis (pmi 'right-stars (Word "sink"))) (length sis) ; 1 (CrossSection (ctv 1 0 9) (WordNode "sink") (ShapeLink (WordNode "to") (Connector (WordNode "seemed") (ConnectorDir "-")) (Connector (VariableNode "$connector-word") (ConnectorDir "+")))) (cog-incoming-set (WordNode "sink")) (cog-incoming-set (ShapeLink (WordNode "to") (Connector (WordNode "seemed") (ConnectorDir "-")) (Connector (VariableNode "$connector-word") (ConnectorDir "+"))))) like them Pierre grow her have go consider see be take feel say think him make come sink fill me (seemed to xxx) count on "him" : 151 (seemd to him) mmt-fmi (define (compute-mmt-fmi ROW-A ROW-B) (define marga (trans-obj 'mmt-count ROW-A)) (define margb (trans-obj 'mmt-count ROW-B)) (define prod (compute-right-product ROW-A ROW-B)) (set-mmt-total) (log2 (* prod mmt-total) (* marga margb))) (define (compute-right-product ROW-A ROW-B) (prod-obj 'right-count (list ROW-A ROW-B))) (define prod-obj (add-support-compute (add-tuple-math star-obj * 'get-count))) (prod-obj 'right-count (list (Word "him") (Word "sink"))) ; 1359.0 151 * 9 = 1359 perfect. (define trans-obj (add-transpose-api star-obj)) sum_d N(sink,d) N(*,d) = 10215 N(sink,d) = 9 N(*,d) = 1135 (map (lambda (sw) (format #t "~A " sw) (ent "sink" sw)) (list "like" "them" "Pierre" "grow" "her" "have" "go" "consider" "see" "be" "take" "feel" "say" "think" "him" "make" "come" "sink" "fill" "me")) mtm-count set-mtm-norms guile -l marginals-mst-shape.scm print-transpose-summary-report Then: Rocks: opened=/home/ubuntu/data//r2-gram-shape-40-junk.rdb Rocks: initial aid=208154 "*-Direct Sum Wild (cset⊕cross-section)" MM^T support=171922.0 count=6842054908.0 entropy=2.1110 OK, so above does this: -- computes MM^T for the full dataset, then filters rows and columns (Plus support was computed incorrectly) Now: MM^T support=300245.0 count=561295158.0 entropy=7.3274 Why is the count so much smaller? Because this is MM^T only on the filtered dataset, not the full one. entropy = log_2 sum_d N(*,d)N(*,d) / [sum_d D(*,d)D(*,d)]^2 Go back to old master: MM^T support=52791.0 count=561295158.0 entropy=2.3118 This is MM^T on the filtered dataset with the broken support code. WTF. clobber the marginals: MM^T support=6455136.0 count=6842054908.0 entropy=12.572 4565380dc65169018a7c11faa87a4e32fc81bf9a -- pre amplitude and no clobber. Gives crazy high MI. MM^T support=300245.0 count=561295158.0 entropy=7.3274 Above, with the clobber of support. MM^T support=6455136.0 count=6842054908.0 entropy=12.572 Sigh. oh tears asleep window lips answer closed afraid game lord reply red tree Crawley pale hurry loved set-right-totals why not L2, L3 ?? get-total-support-right no L2, L3 ... make-central-compute all-left-marginals all-mmt-marginals does not call above... (define btr (batch-transpose csc)) (btr 'mmt-marginals) total-support-right (define left-total-key (PredicateNode "*-Left Total Key-*")) (trans-obj (add-transpose-api star-obj)) (prod-obj (add-support-compute (add-tuple-math star-obj * 'get-count))) (define marga (trans-obj 'mmt-count ROW-A)) (define margb (trans-obj 'mmt-count ROW-B)) ; (define prod (compute-right-product ROW-A ROW-B)) (define prod (prod-obj 'right-count (list ROW-A ROW-B))) (define mmt-total (trans-obj 'total-mmt-count)) (log2 (* prod mmt-total) (* marga margb))) norm-key (add-pair-stars LLOBJ) 'left-stars (LLOBJ 'right-wildcard ITEM) ----------------- (define cset-obj (make-pseudo-cset-api)) (define cstars (add-pair-stars cset-obj)) (define csc (add-covering-sections cset-obj)) (define pwrds (cstars 'left-basis)) (define cwords (csc 'left-basis)) (length pwrds) ; 434 (length cwords) ; 1294 (define ps (make-aset-predicate pwrds)) (define wtf (atoms-subtract cwords pwrds)) (take wtf 10) (WordNode "evening") (Connector (WordNode "evening") (ConnectorDir "+")) evening: the: in- & evening+ (ConnectorSeq (Connector (WordNode "in") (ConnectorDir "-")) (Connector (WordNode "evening") (ConnectorDir "+"))) (define inte (Section (ctv 1 0 456) (WordNode "the") (ConnectorSeq (Connector (WordNode "in") (ConnectorDir "-")) (Connector (WordNode "evening") (ConnectorDir "+"))))) (define fsa (add-subtotal-filter psa 640 128 80 #f)) (define lfa (add-linkage-filter fsa)) (define ae (lfa 'get-all-elts)) ; takes an hour... (length ae) ; 6035 (define in-ap? (make-aset-predicate ae)) (in-ap? inte) ; #t whoops! (define sfs (fsa 'left-basis)) (length sfs) ; 4072 (define in-sf? (make-aset-predicate sfs)) (in-sf? (WordNode "evening")) ; #t (fsa 'right-stars (WordNode "evening")) ... is the empty set. (fsa ' (define zfa (add-zero-filter fsa #f)) (define zfs (zfa 'left-basis)) Very slow, cause it requires right-stars .. for "any" yuck. (length zfs) ; 457 whoa .... supp-obj (add-support-compute 'right-count is sum over right-stars i.e. N(w,*) ----------------- Its broken again. (define cset-obj (make-pseudo-cset-api)) (define cstars (add-pair-stars cset-obj)) (define csc (add-covering-sections cset-obj)) (define pwrds (cstars 'left-basis)) (define cwords (csc 'left-basis)) (length pwrds) ; 409 (length cwords) ; 437 (define ps (make-aset-predicate pwrds)) (define wtf (atoms-subtract cwords pwrds)) (take wtf 10) (WordNode "your") [your: ] --- and: you- & your+ [your: ] --- of: out- & your+ (Connector (WordNode "your") (ConnectorDir "+")) (ConnectorSeq (Connector (WordNode "out") (ConnectorDir "-")) (Connector (WordNode "your") (ConnectorDir "+"))) (Section (ctv 1 0 135) (WordNode "of") (ConnectorSeq (Connector (WordNode "out") (ConnectorDir "-")) (Connector (WordNode "your") (ConnectorDir "+")))) (cstars 'right-stars (WordNode "your")) ... empty set. How did this happen? (define zsc (add-zero-filter csc #f)) (length (zsc 'left-basis)) ; 437 ... why isn't this working? oh right. ; try again (define zob (add-zero-filter cstars #f)) (define zsc (add-covering-sections zob)) (length (zsc 'left-basis)) ; 437 .... wtf ; try again (define lob (add-linkage-filter cstars)) (define lsc (add-covering-sections lob)) (length (lsc 'left-basis)) ; 437 .... wtf ; try again (define lob (add-linkage-filter cstars)) (length (lob 'left-basis)) ; 409 OK. (define zlb (add-zero-filter lob #f)) (define czl (add-covering-sections zlb)) (length (czl 'left-basis)) ; 437 .... wtf --- (length (cstars 'right-basis)) ; 2496 (length (delete-dup-atoms (map (lambda (SEC) (gdr SEC)) (cstars 'get-all-elts)))) ; 2496 .. again. (define conseqs (delete-dup-atoms (map (lambda (SEC) (gdr SEC)) (lob 'get-all-elts)))) (length conseqs) ; 2405 so some removed. (define conwords (delete-dup-atoms (concatenate (map (lambda (CONSEQ) (map (lambda (CON) (gar CON)) (cog-outgoing-set CONSEQ))) conseqs)))) (length conwords) ; 357 oh huh. much smaller... --- (length (psa 'right-stars (WordNode "your"))) ; 36956 (define fsa (add-subtotal-filter psa 640 128 80 #f)) (length (fsa 'right-stars (WordNode "your"))) ; 1 your: beg- & pardon+ (define zfa (add-zero-filter fsa #f)) (define zfs (zfa 'left-basis)) (length zfs) ; 457 (zfa 'right-stars (WordNode "your")) (length (zfa 'right-stars (WordNode "your"))) ; same as before (define lfa (add-linkage-filter zfa)) (lfa 'right-stars (WordNode "your")) ; Now its the empty set. (takes long time) (length (lfa 'left-basis)) ; 457 ... ouch (length (lfa 'right-basis)) ; 2496 (length (delete-dup-atoms (map (lambda (SEC) (gdr SEC)) (lfa 'get-all-elts)))) ; 2496 arghh ================================================= r3-zfil-640-128-80-fresh.rdb Done storing 4006 pairs in 1 secs Total store time = 5275 secs r3-zfil-640-128-80-wtf.rdb guile -l cogserver.scm (load-atomspace) (cog-report-counts) (Connector . 666) (ConnectorSeq . 2496) (Section . 4006) (WordNode . 437) (define all-words (cog-get-atoms 'WordNode)) (define all-sects (cog-get-atoms 'Section)) (define all-seqs (cog-get-atoms 'ConnectorSeq)) (define all-cons (cog-get-atoms 'Connector)) (define pca (make-pseudo-cset-api)) (define psa (add-pair-stars pca)) (define btr (batch-transpose psa)) (btr 'mmt-marginals) (cog-report-counts) (ListLink . 2906) (print-matrix-summary-report psa) Rows: 409 Columns: 2496 Size: 4006 non-zero entries Sparsity (-log_2): 7.9934 entropy=-3.191 (report-plain) (cog-report-counts) (WordNode . 451) <--- from the report. guile -l marginals-mst-shape.scm Rows: 437 Columns: 6961 Size: 12340 Sparsity (-log_2): 7.9455 entropy=-1.338 .. this is correct. (cog-report-counts) (Connector . 668) (ShapeLink . 4465) (CrossSection . 8334) (EvaluationLink . 438) guile -l marginals-mst.scm ... still OK. (load "/tmp/smack.scm") ------------------------------------------------------- 54 threads 1: futex_wait_cancelable scm_timed_lock_mutex scm_call_n 2 3 4 5 6 7 8 9 10-16: futex_wait_cancelable GC_wait_marker 17 45 __GI___libc_read GC_do_blocking_inner scm_call_n 18 19 20 futex_wait_cancelable LogWriter::writing_loop opencog/util/Logger.cc:239 18: opencog.log 19: /tmp/cogserver.log 20: /tmp/run-3-shape.log 21 __libc_pause immortal_thread 22 __libc_accept NetworkServer::listen 23 __GI___clock_nanosleep CogServer::serverLoop 24-43: futex_wait_cancelable rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper 44 futex_wait_cancelable rocksdb::DeleteScheduler::BackgroundEmptyTrash() 46 49 50 __libc_recvmsg ServerSocket::handle_connection 47 53 54 futex_wait_cancelable SchemeEval::do_eval 48 52 futex_abstimed_wait_cancelable SchemeEval::do_poll_result 51 _int_free PatternMatchEngine::solution_pop ...opencog::PrimitiveEnviron::do_call above is running at 100% in thread 51 the other threads that should be running are: 47 53 54 so whats's up? futex_wait_cancelable __pthread_cond_wait_common __pthread_cond_wait scm_timed_lock_mutex scm_call_n scm_call_3 scm_call_n scm_call_2 scm_c_catch opencog::SchemeEval::do_eval So the other threads are waiting on the pattern-matcher thread. Because the pattern matcher thread must be running with a lock held. which lock? 54 Thread 0x7f5dbbfff700 (LWP 18984) 53 Thread 0x7f5dd0ff9700 (LWP 18983) 51 Thread 0x7f5dd1ffb700 (LWP 18575 47 Thread 0x7f5dd3fff700 (LWP 5861) set thread name.... #include prctl(PR_SET_NAME, "fo", 0, 0, 0); 16 bytes cogserv:listen cogserv: NetworkServer::listen CogServer::serverLoop ServerSocket::handle_connection GenericShell::eval_loop poll_loop __pthread_cond_wait scm_timed_lock_mutex scm_call_n scm_call_3 scm_call_n scm_call_2 scm_c_catch opencog::SchemeEval::do_eval scm_try_mutex scm_lock_mutex libguile/vm.c ret = vm_engines[vp->engine](thread); lock-mutex add-pair-stars object run-query raii-get-pattern f-left-star-pat f-right-star-pat f-left-dual-pat f-right-dual-pat default-left-dual-pat default-right-dual-pat -- uses GetLink --> use MeetLink instead default-right-star-pat -- uses Bind --> use Query instead. use cog-value->list left-star-pattern right-star-pattern left-dual-pattern right-dual-pattern Fixed in #2834. Yay! ------------------- (define (dump A) (format FH "(~A #f) (cog-map-type dump 'Atom) ---------- for amirouche: run-1-en_mpg-tranche-1.rdb (use-modules (opencog) (opencog persist) (opencog persist-file)) (use-modules (opencog nlp) (opencog persist-rocks)) (use-modules (opencog matrix) (opencog nlp learn)) (define rsn (RocksStorageNode "rocks:///data/run-1-en_mpg-tranche-1.rdb")) (cog-open rsn) ;;; (load-atomspace rsn) (define cset-obj (make-pseudo-cset-api)) (cset-obj 'fetch-pairs) ; 994 secs (print-matrix-summary-report cset-obj) (cog-close rsn) (count-all) ; 15996448 (cog-report-counts) ((PredicateNode . 11) (ListLink . 1) (AnyNode . 2) (Connector . 260275) (ConnectorDir . 2) (ConnectorSeq . 7470276) (Section . 8131679) (AnchorNode . 1) (SchemaNode . 1) (RocksStorageNode . 1) (WordNode . 134199)) (define fsn (FileStorageNode "/tmp/en-tranche-1-mpg.scm")) (cog-open fsn) (store-atomspace fsn) (cog-close fsn) RSS 12.4 G 2474901877 96924177 =================================================== wtf filter. (psa 'left-basis) (define prs (psa 'get-all-elts)) Ahh. No support! make-compute-freq make-batch-mi batch-all-pair-mi Run `((add-support-compute LLOBJ) 'cache-all)` to compute that data. 'wild-wild-count (length (LLOBJ 'get-all-elts)) (rpt-obj 'num-pairs) --- set with ((add-report-api LLOBJ) 'set-size blah do this minimally: ((make-central-compute LLOBJ) 'cache-total) However, this requires supp-obj 'total-support-left to work and that can't work until .... (sup-obj 'wild-wild-count) Set with 'set-left-totals or set-right-totals Same as get-total-count-left Same as ((add-support-api LLOBJ) 'get-total-count-left) which is set by ... computed by (add-support-api LLOBJ) 'total-support-left Crap. There's no simple efficient way to count. What about fetch-pairs !? Hang on. (define sup (add-support-api psa)) (sup 'total-support-left) is the total pairs. (sup 'total-count-left) is the wild-wild Need ... 'left-dim 'right-dim 'num-pairs ... (inexact->exact (round ... done. (add-support-compute 'cache-all (define sup (add-support-api psa)) (sup 'set-size (psa 'left-basis-size) (psa 'right-basis-size) (sup 'wild-wild-count)) (store-atom (psa 'wild-wild)) (define storo (make-store psa)) (storo 'store-wildcards) ------------------------- (define wi (ListLink (AnyNode "cset-word") (AnyNode "cset-disjunct"))) (cog-keys wi) ((PredicateNode "*-Right Total Key-*") (PredicateNode "*-TruthValueKey-*") (PredicateNode "*-MM^T Product Key-*") (PredicateNode "*-Left Total Key-*") ) and they are good, so wtf (define rpt-obj (add-report-api psa)) (rpt-obj 'left-dim) There is no "*-Dimension Key-*" .. why? cause make-central-compute never ran... 164 min and 69.7 GB before starting central compute. ---------------------- ((PredicateNode . 14) (ListLink . 26076503) (AnyNode . 2) (Connector . 725460) (ConnectorDir . 2) (ConnectorSeq . 25698949) (Section . 28436901) (TypeNode . 2) (AnchorNode . 1) (SchemaNode . 1) (RocksStorageNode . 2) (WordNode . 377553)) after col trim: (PredicateNode . 14) (ListLink . 2557528) (AnyNode . 2) (Connector . 379443) (ConnectorDir . 2) (ConnectorSeq . 2366262) (Section . 4962919) (TypeNode . 2) (AnchorNode . 1) (SchemaNode . 1) (RocksStorageNode . 2) (WordNode . 191265)) ((PredicateNode . 14) (ListLink . 2557528) (AnyNode . 2) (Connector . 379443) (ConnectorDir . 2) (ConnectorSeq . 2366262) (Section . 4000706) (VariableNode . 2) (QueryLink . 1791930) (TypeNode . 2) (TypedVariableLink . 2) (AnchorNode . 1) (SchemaNode . 1) (RocksStorageNode . 2) (WordNode . 191265)) Ooops QueryLink not cleaned up. Now ((PredicateNode . 14) (ListLink . 2557528) (AnyNode . 2) (Connector . 379443) (ConnectorDir . 2) (ConnectorSeq . 2366262) (Section . 2208776) (TypeNode . 2) (AnchorNode . 1) (SchemaNode . 1) (RocksStorageNode . 2) (WordNode . 191265)) After save and restore: ((PredicateNode . 8) (ListLink . 1791931) (AnyNode . 2) (Connector . 157829) (ConnectorDir . 2) (ConnectorSeq . 1733439) (Section . 2208776) (AnchorNode . 1) (SchemaNode . 1) (RocksStorageNode . 1) (WordNode . 102656)) why more word nodes than array dims ??? (VariableNode "$api-right-star") ; After trimming, there may be left and right basis elements ; that are not in any pairs, but have not been deleted. ; Delete those now. (for-each (lambda (base) (if (and (cog-atom? base) (equal? 0 (cog-incoming-size base))) (cog-delete-recursive! base))) (early-stars 'right-basis)) (cog-get-atoms 'WordNode)) (cog-get-atoms 'Connector)) (WordNode "whipcord") Oh. Linkage filter. (WordNode "Bowline") OK, marginals were not removed... Let's hack that.. (for-each (lambda (base) (if (and (cog-atom? base) (equal? 1 (cog-incoming-size base))) (cog-delete-recursive! base))) (cog-get-atoms 'WordNode)) (define cnt 0) (for-each (lambda (base) (if (and (cog-atom? base) (equal? 1 (cog-incoming-size base))) (set! cnt (+ 1 cnt)))) (cog-get-atoms 'ConnectorSeq)) (for-each (lambda (base) (if (and (cog-atom? base) (equal? 0 (cog-incoming-size-by-type base 'Section))) (cog-delete-recursive! base))) (cog-get-atoms 'ConnectorSeq)) (cog-get-atoms 'Word)) WTF. How can we *still* be hitting the above??? ((make-store psa) 'store-wildcards) ---------------- dropdb en_mpg_53 dropdb en_huge_marg -------------------- (load-atoms-of-type 'Any) (load-referers (AnyNode "cset-word")) (load-referers (AnyNode "cset-disjunct")) (define pca (make-pseudo-cset-api)) (pca 'wild-wild) Gahhh Atoms tot 156M of which 106M are word-pairs. 75072984 (ListLink . 37060608) (Connector . 951753) (ConnectorSeq . 36562917) (WordNode . 497690) after trimming words: 71610887 (ListLink . 35561670) (Connector . 487523) (ConnectorSeq . 35315198) (WordNode . 246471) after trimming connseqs: cp -pr run-1-marg-tranche-12.rdb run-1-t12-trim-1-1-1.rdb (define storage-node (RocksStorageNode "rocks:///home/ubuntu/data/run-1-t12-trim-1-1-1.rdb")) (define pca (make-pseudo-cset-api)) (fetch-atom (pca 'wild-wild)) (print-matrix-summary-report pca) ------------------------- (define sim (add-similarity-compute STARS)) (sim 'foo wa wb) define-public (make-merger STARS MPRED FRAC-FN NOISE MIN-CNT STORE MRG-CON) , the ###LEFT-WALL### . of to and a in Crash: mmap(PROT_NONE) failed Aborted bdwgc os_dep.c: ABORT("mmap(PROT_NONE) failed"); line 2658 GC_INNER void GC_unmap(ptr_t start, size_t bytes) Happened again... and a third time: #2 0x00007ffff7a27513 in GC_unmap (start=, bytes=) at ../extra/../os_dep.c:2576 #3 GC_unmap (start=, bytes=) at ../extra/../os_dep.c:2535 #4 0x00007ffff7a27601 in GC_unmap_old () at ../extra/../allchblk.c:419 #5 0x00007ffff7a2eef5 in GC_unmap_old () at ../extra/../allchblk.c:404 #6 GC_finish_collection () at ../extra/../alloc.c:1115 #7 0x00007ffff7a2f375 in GC_try_to_collect_inner ( stop_func=0x7ffff7a1c2d0 ) at ../extra/../alloc.c:553 #8 GC_try_to_collect_inner (stop_func=0x7ffff7a1c2d0 ) at ../extra/../alloc.c:485 #9 0x00007ffff7a2f69c in GC_collect_or_expand (needed_blocks=needed_blocks@entry=220, ignore_off_page=ignore_off_page@entry=0, retry=) at ../extra/../alloc.c:1443 #10 0x00007ffff7a2fbcf in GC_alloc_large (lb=, lb@entry=898912, k=k@entry=1, flags=flags@entry=0) at ../extra/../malloc.c:66 #11 0x00007ffff7a33bb9 in GC_generic_malloc (lb=898912, k=1) at ../extra/../malloc.c:264 #12 0x00007ffff7a33e12 in GC_malloc_kind_global (lb=898912, k=1) at ../extra/../malloc.c:327 #13 0x00007ffff7f437a9 in scm_c_make_vector () from /usr/lib/x86_64-linux-gnu/libguile-3.0.so.1 #14 0x00007ffff7edc1d6 in ?? () from /usr/lib/x86_64-linux-gnu/libguile-3.0.so.1 #15 0x00007ffff7edc656 in scm_hash_fn_create_handle_x () from /usr/lib/x86_64-linux-gnu/libguile-3.0.so.1 #16 0x00007ffff7edc9ee in scm_hash_fn_set_x () from /usr/lib/x86_64-linux-gnu/libguile-3.0.so.1 #17 0x00007ffff7edcc0f in scm_hashx_set_x () #24 0x00007ffff7f4d705 in scm_call_n () from /usr/lib/x86_64-linux-gnu/libguile-3.0.so.1 #25 0x00007ffff7ec7133 in scm_call_3 () from /usr/lib/x86_64-linux-gnu/libguile-3.0.so.1 if (GC_n_heap_sects >= MAX_HEAP_SECTS printf("duude aeeiiie sects=%ld\n", GC_n_heap_sects); 1343 commit d3dede3ce4462cd82a15f161af797ca51654546a (HEAD, tag: v8.0.4, release-8_0_4) Author: Ivan Maidanski Date: Sat Mar 2 10:09:42 2019 +0300 [8.0.4] Bump gc version to 8.0.4 ../configure --enable-large-config make make check sudo make install Hang on, was this linked, or was it libgc-1.3.2 from install ??? /usr/lib/x86_64-linux-gnu/libgc.so.1.3.2 ldd /usr/bin/guile says: libgc.so.1 => /usr/local/lib/libgc.so.1 so good (bover 'batch-list LIST) cog-rocks-stats: Atomspace holds 895752 atoms Connected to `rocks:///home/ubuntu/data//run-1-t123-tsup-1-1-1.rdb` Database contents: Next aid: 138920740 Atoms/Links/Nodes a@: 25214554 l@: 25191416 n@: 22826 Keys/Incoming/Hash k@: 37241924 i@: 50417644 h@: 0 Unix max open files rlimit= 1024 1048576 (WordNode . 22899) (WordNode . 19606) Database contents: Next aid: 138920740 Atoms/Links/Nodes a@: 22773596 l@: 22753677 n@: 19629 Keys/Incoming/Hash k@: 33583720 i@: 45542826 h@: 0 Unix max open files rlimit= 1024 1048576 (ListLink (ConnectorSeq (Connector (WordNode "It") (ConnectorDir "-")) (Connector (WordNode "no") (ConnectorDir "+")) (Connector (WordNode "uncommon") (ConnectorDir "+"))) (AnyNode "cset-disjunct")) (define (get-right-wildcard WORD) (ListLink WORD any-right)) (right-wildcard) --------------------------- Possible alternate titles: “Recongizing part-whole hierarchies”. Solving the frame problem Solving the symbol grounding problem initially 85219 (define sup (add-support-api sha)) (define rdl (sha 'right-duals wb)) (define psu (add-support-compute sha)) (psu 'set-right-marginals wb) ;; Uh needed? psu 'all-left-marginals ; too slow (for-each (lambda (COL) (psu 'set-left-marginals COL)) (sha 'right-duals wb)) (atc 'set-mmt-marginals wb) (define e (make-elapsed-secs)) (psu 'set-left-totals) ; needed? (e) store-mmt (store-atom (atc 'set-mmt-totals)) (sha 'right-duals (WordClass "to from")) store-mmt asc 'set-left-marginals DJ calls left-stars on colum (dj) these have secions and crosses as apporiate, bu wclas?? direct sum: disjoint-left is #t if left types differ distinct-type is #t if wiether differ ((PredicateNode . 13) (ListLink . 220087) (AnyNode . 7) (Connector . 17092) (ConnectorDir . 2) (ConnectorSeq . 205003) (Section . 855718) (ShapeLink . 838580) (CrossSection . 1922250) (VariableNode . 1) (TypeNode . 4) (TypeChoice . 2) (AnchorNode . 1) (SchemaNode . 1) (RocksStorageNode . 1) (WordNode . 15083)) wtf... add-covering-sections .. really!? arg.... add-covering-sections returns wrong pairs wrong left-right. Double check -- why weren't we using this before? OK, so no counts... how did that happen? (define sup (add-support-api sha)) (sup 'right-count (WordNode "—")) ; 20172.0 (sup 'right-count (WordNode "+")) ; 9339.0 (sup 'right-count (WordClassNode "— +")) ; 14900.0 (sup 'right-count (WordNode "coincidence")) Did clobber do this? Yes it did! Ouch! clobber kills all left and right support marginals. This is too extreme. (recompute-mmt is defined already ... Ohh... (pair-wise-cluster LLOBJ NRANK LOOP-CNT) (define wc (WordClass "should could")) (define cnt 0) (for-each (lambda (in) (if (not (equal? (cog-type in) 'SimilarityLink)) (format #t "duude ~A\n" in))) (cog-incoming-set wc)) (remove-empty-sections sha wc) (define ci (cog-incoming-set wc)) Failed merge (There there) (should could) be Some cross-sections hanging out. So appearantly, this is a real merge bug. We need unit tests. [be, ({sc}, there- & $+)] [be, ({sc}, There- & $+)] [there, ({sc}, $- & be+)] [There, ({sc}, $- & be+)] (define th-sc-vbe (CrossSection ; (ctv 1 0 67) (WordNode "There") (ShapeLink (WordClassNode "should could") (Connector (VariableNode "$connector-word") (ConnectorDir "-")) (Connector (WordNode "be") (ConnectorDir "+")))) ) (define be-sc-thv (CrossSection ; (ctv 1 0 67) (WordNode "be") (ShapeLink (WordClassNode "should could") (Connector (WordNode "There") (ConnectorDir "-")) (Connector (VariableNode "$connector-word") (ConnectorDir "+")))) ) add-cluster-gram gram-class-api.scm (define maybes-sc-thv (cog-link 'Shape (WordClassNode "should could") (Connector (WordNode "There") (ConnectorDir "-")) (Connector (VariableNode "$connector-word") (ConnectorDir "+")))) (define be-sc-thv #f) (if maybes-sc-thv (set! be-sc-thv (cog-link 'CrossSection (WordNode "be") maybes-sc-thv))) (format #t "duuude excess ~A and ~A\n" maybes-sc-thv be-sc-thv) OK, so these are there prior to the merge! (define maythere (cog-link 'Section (WordClassNode "should could") (ConnectorSeq (Connector (WordNode "There") (ConnectorDir "-")) (Connector (WordNode "be") (ConnectorDir "+"))))) (reshape-merge LLOBJ CLA MRG CLB PAIR 1.0 NOISE) (define badcnt 0) (define goodcnt 0) (define (check-x xsec) (define sect (sha 'get-section xsec)) (if (nil? sect) (begin (format #t "oh no! bad cross=~A\n" xsec) (set! badcnt (+ 1 badcnt)) ) (set! goodcnt (+ 1 goodcnt)) ) ) (for-each (lambda (CLS) (for-each (lambda (SHP) (for-each check-x (cog-incoming-by-type SHP 'CrossSection)) ) (cog-incoming-by-type CLS 'ShapeLink)) ) (cog-get-atoms 'WordClass)) 6152 bad ones (define cls (car (cog-get-atoms 'WordClass))) (define ship (list-ref (cog-incoming-by-type cls 'ShapeLink) 2)) (define xes (cog-incoming-by-type ship 'CrossSection)) (for-each (lambda (SHP) (for-each check-x (cog-incoming-by-type SHP 'CrossSection)) ) (cog-incoming-by-type cls 'ShapeLink)) ; Sanity check (define (check-crosses LLOBJ WHERE) (for-each (lambda (XSECT) (define sect (LLOBJ 'get-section XSECT)) (when (nil? sect) (format #t "Fail with an F on ~A\n" XSECT) (throw 'bad-cleanup WHERE "CrossSections without Sections"))) (cog-get-atoms 'CrossSection))) (`—`, `+`) -- no problem (`;`, `,`) -- no problem... (`is`, `was`) .. no problem ... wtf ... (`and`, `but`) .. no problem ... (for-each (lambda (n) (pair-wise-cluster sha 100 1) (check-crosses sha 'foo)) (iota 500)) (define (none WA WB) 0.0) (define (always WA WB) #t) (define (store-mmt WRD) #f) (define (store-final) #f) (define mrg (make-merger sha always none 0 0 store-mmt store-final #t)) (mrg 'merge-function (Word "and") (Word "but")) (check-crosses sha 'foo) ;;;; passes good (mrg 'merge-function (WordClass "and but") (Word "is")) (check-crosses sha 'foo) ;; passes good (mrg 'merge-function (WordClass "and but") (Word "was")) (check-crosses sha 'foo) ;; still good .. wtf (mrg 'merge-function (Word "It") (Word "He")) (check-crosses sha 'foo) ;; still good ... (mrg 'merge-function (WordClass "and but") (WordClass "It He")) (check-crosses sha 'foo) ;; still good. Bug is not reproducible... ------------------------ ------ Next in line: ranked-MI = 7.2713 MI = 5.8597 (`would must`, `should could`) Start merge 23 of `would must` and `should could` ------ Combine: Merged 1197 sections in 0.000 secs; 1255145472.0 scts/sec ------ Combine: Revised 1197 shapes in 1.000 secs; 1197.0 scts/sec ice-9/boot-9.scm:1669:16: In procedure raise-exception: Throw to key `non-empy-class' with args `(merge-clusters "we expect it to be empty!")'. In opencog/nlp/learn/agglo-rank.scm: 412:16 3 (_ 0) In opencog/nlp/learn/gram-projective.scm: 879:32 2 (merge (WordClassNode "would must") #) 690:16 1 (merge-clusters _ _ _ _ _) Not in any sections or crosses any more but ... (cog-incoming-size (Connector (WordClass "should could") (ConnectorDir "-"))) (cog-incoming-size (Connector (WordClass "should could") (ConnectorDir "-"))) 119 and 132 (define wc (WordClass "should could")) (define cm (list-ref (cog-incoming-set wc) 1)) (define cp (list-ref (cog-incoming-set wc) 2)) (define bcnt 0) (for-each (lambda (CSQ) (for-each (lambda (LL) (if (< 0 (cog-count LL)) (set! bcnt (+ 1 bcnt))) ) (cog-incoming-set CSQ)) ) (cog-incoming-set cp)) ---------------------------------- (define sap (add-similarity-api sha #f "shape-mi")) (define (ranked-mi-sim WA WB) (define miv (sap 'pair-count WA WB)) (if miv (cog-value-ref miv 1) -inf.0)) (define wli (take ranked-words 200)) (optimal-in-group ranked-mi-sim (Word "were") (Word "are") 0.7 wli) on upon will may had has and but . ? It He in of fixed bugs: oh no do we need TypeChoice!? I think we do... DONE * direct-sum.scm left types... DONE * is mmtq loops correct? i.e. are supports correct? i.e. are duals being fetches correctly? I think it's correct, but WordClass was not right. FIXED. * don't recomp similarity if present. DONE. * what is make-gram-class-api ?? not used anywhere! In gram-class-api.scm Yeah, fixed. Now its key to clustering. * who calls store-aux? No one, it seems ... IGNORE. I guess the run files maybe?? should do this ?? * Need to fetch member links, too!? DONE now in gram-class-api * review trim stuff... as compared to gram-class-api.scm DONE * fix trim instructions. DONE * backwards order in names for clusters. DONE add-cluster-gram 'make-cluster * fix run files to call add-gram-class-api DONE not needed * Need to update instructions. DONE * duplicate cluster names how will that work ?? NOT AN ISSUE. Let the code do it's thing. todo -- * Maybe do member management in add-cluster-gram ?? ----------------- overlap cosine (define pcos (add-similarity-compute sha)) (define (get-cosine wa wb) (pcos 'right-cosine wa wb)) (get-cosine (WordClass "###LEFT-WALL### :") (Word "###LEFT-WALL###")) ; 0.480889 (get-cosine (WordClass "###LEFT-WALL### :") (Word ":")) ; 0.35835 (get-cosine (Word "###LEFT-WALL###") (Word ":")) ; 0.33167 (define (get-ovlap wa wb) (pcos 'right-overlap wa wb)) (get-ovlap (Word "###LEFT-WALL###") (Word ":")) ; 8.74631e-4 Wow. So overlap is tiny! but cosine is huge! That means almost all disjuncts have a small count (so as to keep the cosine denominator a small as possible) (define ecnt 0) (define bcnt 0) (define (prt DJA DJB) (format #t "duude ~A" DJ) 0) (define opr (add-support-compute (add-tuple-math sha prt 'right-element))) (define (dct x y) (if (or (< 0 x) (< 0 y)) (set! ecnt (+ 1 ecnt))) (if (and (< 0 x) (< 0 y)) (set! bcnt (+ 1 bcnt))) 0) (define dpr (add-support-compute (add-tuple-math sha dct 'get-count))) (dpr 'right-count (list (Word "###LEFT-WALL###") (Word ":"))) (define (add-id OBJ) (lambda (message . args) (case message ((ident) args) (else (apply OBJ (cons message args)))))) (define (dum SXA SXB) (define x (if (equal? 0 SXA) 0 (get-count (car SXA)))) (define y (if (equal? 0 SXB) 0 (get-count (car SXB)))) (if (or (< 0 x) (< 0 y)) (set! ecnt (+ 1 ecnt))) (when (and (< 0 x) (< 0 y)) (set! bcnt (+ 1 bcnt)) (format #t "duude ~A" SXA) ) 0) (define ppr (add-support-compute (add-tuple-math (add-id sha) dum 'ident))) ------------ and again: (`is was`, `had has`) (define wc (WordClass "had has")) Even after second run, there are some shapes with connectors in them ((CrossSection (ctv 1 0 12) (WordNode "always") (ShapeLink (WordClassNode "had has") (Connector (WordNode "She") (ConnectorDir "-")) (Connector (VariableNode "$connector-word") (ConnectorDir "+")) (Connector (WordNode "been") (ConnectorDir "+")))) ) Did we fail to scan.. what? What did we forget? the passes did not iterate over shapes... and to covert those into the orig sections... Need manual check 1) for each shape having word/word-class, get all sections they belong to, and verify counts. Are teh counts still good? make-disinfo make-midisc make-mifuzz make-discrim make-fuzz accum-counts reshape-merge (ACCUMULATE LLOBJ CLUST SECT WEIGHT) make-merger empty cluster, use accum fun that knows the pairs and votes. non-empty cluster... use accum fun tailored for that. no xes for the fraction if one is missing ... then... (define (reshape-crosses MRG W PR WEI) (reshape-merge LLOBJ CLS MRG W PR WEI ACCUMULATE)) clustsection is made anyway, call (reshape-merge LLOBJ CLS clustsect WA SECTA frac accfun) where accfun is the old counct-acc (clique LLOBJ CLUST SECT) ccalls (ACCUMULATE LLOBJ clustsect SECTA frac) where ACCUMULATE is the old counct-acc whose sig is (accumulate-count LLOBJ clustsec secta FRAC NOISE) ----- OK, so (j, a- b- e+) cnt= 15.75 is correct, But its not knocked down because ... [e, ] cnt = 12.75 this should be absent. (j,egh) from extra-j [e, ] cnt = 15.75 that''s right. cons: setup-j-extra adds (j,abe) and (j-egh) setup-e-extra adds : (e, abe) and (e abj) consext crosses on e should be (1-p) * [e, ] -- this is seen OK this one: (1-p) * [e, ] should be absorbed but was not. The cross-section merge should have picked this up! so first past converts fine. The bad version: --- duude elt before 15.750 * [e, ] prelique frac=0.25 on 5.250 * [{e j}, ] . 15.750 * [e, ] +++ accumulate never called! good version doesn't either. +++ postique frac=0.25 on 0.000 * [{e j}, ] . 15.750 * [e, ] duude elt after 15.750 * [e, ] --- so that ain't geood, but later on, good version: duude elt before 0.000 * (j, a- b- e+) prelique frac=1.0 on 0.000 * ({e j}, a- b- e+) . 0.000 * (j, a- b- e+) postique frac=1.0 on 0.000 * ({e j}, a- b- e+) . 0.000 * (j, a- b- e+) duude elt after 0.000 * (j, a- b- e+) Does the above do it??? duude elt before 0.000 * [e, ] prelique frac=0.25 on 5.250 * [{e j}, ] . 0.000 * [e, ] postique frac=0.25 on 0.000 * [{e j}, ] . 0.000 * [e, ] duude elt after 0.000 * [e, ] In the good version, [e, ] somehow got zeroed... how? because (j, abe) got ...zeroed??? it was zeroed in the first round. why not zered in the second round? the matching sections should be (1-p) * (j, egh) (1-p) * (j, abe) -- no!! ==================================================== reshape-merge is called with a FRAC that it passes to accumulate-count That frac comes from clique ... Clique sets it to ffractional if sections don't overlap. clique is called with either ACCUMULATE (from higher reaches, the grand desion maker or or clique is calle with accumulate-count from reshape. Higher fun really should get same args as reshape-merge. i.e. ACC-FUN/ACCUMULATE should make the if frac decision so FRAC_FUN disappears from the signature can we change signature on ACC-FUN aka ACCUMULATE to match the more comprehensive signature on reshape-merge which is currently (reshape-merge OBJ CLS MRGECT WA SECT FRAC) that would mean making the sginature on accumulate-count match that on reshape-merge Again: clique does not need the CLS argument, its available externally. Maybe clique should get the dj argument? clique is called with accumulate-count during assign w/numeric frac clique is called with reshape-merge during merge step. w/numeric frac reshape uses FRAC in only one place! ... but it's key. ----------- old bug ... In opencog/nlp/learn/gram-projective.scm: 924:32 2 (merge (WordClassNode "would must") (WordClassNode "had has") ) 736:16 1 (merge-clusters _ _ _ _ _) (define CLB (WordClassNode "had has")) (cog-incoming-size-by-type CLB 'Section) ;10 (cog-incoming-size-by-type CLB 'CrossSection) ;11 (cog-incoming-size-by-type CLB 'Shape) ;10 but (check) doesn't complain because these are all balanced! TODO: checkout branch r9-detailed-balance and halt before the merge is started, dump all sections and cross-sections for these guys into a file. Ouch. But I see no other alternative, right now. ohh ... wait.. the failing sections are all of the form: (Section (ctv 1 0 113) (WordClassNode "had has") (ConnectorSeq (Connector (WordNode "who") (ConnectorDir "-")) (Connector (WordClassNode "would must") (ConnectorDir "+")))) and bizarrely: (CrossSection (ctv 1 0 113) (WordClassNode "had has") (ShapeLink (WordClassNode "would must") (Connector (WordNode "who") (ConnectorDir "-")) (Connector (VariableNode "$connector-word") (ConnectorDir "+")))) which is NOT the explosion! so that's .. not right! Note the balanced counts! Also not deleted: (cog-incoming-set (ShapeLink (WordClassNode "had has") (Connector (VariableNode "$connector-word") (ConnectorDir "-")) (Connector (WordNode "been") (ConnectorDir "+")))) $7 = ((CrossSection (ctv 1 0 6) (WordClassNode "would must") (ShapeLink (WordClassNode "had has") (Connector (VariableNode "$connector-word") (ConnectorDir "-")) (Connector (WordNode "been") (ConnectorDir "+")))) ) curious ... need to eximine the before state... Speed: (define (get-prods LLOBJ WA WB) (define killer (uniquely-named-variable)) (define dj-var (uniquely-named-variable)) (define a-term (LLOBJ 'make-pair WA dj-var)) (define b-term (LLOBJ 'make-pair WB dj-var)) (define r-type (LLOBJ 'right-type)) (define qry (Bind (TypedVariable dj-var r-type) (Present a-term b-term) (List a-term b-term killer))) (define e (make-elapsed-secs)) (define termlist (cog-value->list (cog-execute! qry))) (format #t "duude got ~A entries in ~A sec\n" (length termlist) (e)) (define prod (fold (lambda (WPR CNT) (define acnt (LLOBJ 'get-count (gar WPR))) (define bcnt (LLOBJ 'get-count (gdr WPR))) (+ CNT (* acnt bcnt))) 0 termlist)) (format #t "duude got prod ~A in ~A sec\n" prod (e)) (cog-extract-recursive! killer) (format #t "duude extract in ~A sec\n" (e)) ) (get-prods covr-obj (Word "the") (Word "his")) (olde covr-obj (Word "the") (Word "his")) (newe covr-obj (Word "the") (Word "his")) vs. (define SIM-ID "foo") (define (olde LLOBJ WA WB) (define sim (make-simmer LLOBJ)) (define e (make-elapsed-secs)) (sim WA WB) (format #t "duude sim in ~A sec\n" (e))) psu 'set-left-marginals psu 'set-right-marginals atc 'set-mmt-marginals (define*-public (add-fast-mi-compute LLOBJ #:optional (GET-CNT 'get-count)) (define (make-faster LLOBJ) (define (newe LLOBJ WA WB) (define sim (make-faster LLOBJ)) (define e (make-elapsed-secs)) (sim WA WB) (format #t "duude sim in ~A sec\n" (e))) (define po (add-support-compute (add-tuple-math star-obj *))) (define pn (add-support-compute (add-fast-math star-obj *))) (po 'right-count (list (Word "the") (Word "his"))) ; 12738761.0 (pn 'right-count (list (Word "the") (Word "his"))) ; 7806378.0 (define star-obj sapi) (po 'right-count (list (Word "chicken") (Word "dog"))) (pn 'right-count (list (Word "chicken") (Word "dog"))) (define cosi (add-similarity-compute covr-obj)) (cosi 'right-product (Word "the") (Word "his")) ; 5593136.0 aiee three differrent results! foo wait: (po 'right-count (list (Word "the") (Word "his"))) ; 5593136.0 (pn 'right-count (list (Word "the") (Word "his"))) ; 4171678.0 (cosi 'right-product (Word "the") (Word "his")) ; 5593136.0 Oh, the make-pair on direct sum was returning incomplete results (define (comp WA WB) (define e (make-elapsed-secs)) (define prn (pn 'right-count (list WA WB))) (format #t "new got ~A in ~A\n" prn (e)) (define pro (po 'right-count (list WA WB))) (format #t "old got ~A in ~A\n" pro (e))) (comp (Word "the") (Word "his")) ============================================================== (define qrya (BindLink (TypedVariableLink (VariableNode "$1159996965843-ukZpW48IhJwb9sl7DZMsIN1C") (TypeChoice (TypeNode "ConnectorSeq") (TypeNode "ShapeLink"))) (PresentLink (ChoiceLink (Section (WordNode "his") (VariableNode "$1159996965843-ukZpW48IhJwb9sl7DZMsIN1C")) (CrossSection (WordNode "his") (VariableNode "$1159996965843-ukZpW48IhJwb9sl7DZMsIN1C"))) (ChoiceLink (Section (WordNode "the") (VariableNode "$1159996965843-ukZpW48IhJwb9sl7DZMsIN1C")) (CrossSection (WordNode "the") (VariableNode "$1159996965843-ukZpW48IhJwb9sl7DZMsIN1C")))) (ListLink (ChoiceLink (Section (WordNode "his") (VariableNode "$1159996965843-ukZpW48IhJwb9sl7DZMsIN1C")) (CrossSection (WordNode "his") (VariableNode "$1159996965843-ukZpW48IhJwb9sl7DZMsIN1C"))) (ChoiceLink (Section (WordNode "the") (VariableNode "$1159996965843-ukZpW48IhJwb9sl7DZMsIN1C")) (CrossSection (WordNode "the") (VariableNode "$1159996965843-ukZpW48IhJwb9sl7DZMsIN1C")))) )) (define qsa (cog-value->list (cog-execute! qrya))) (define qryb (BindLink (VariableList (TypedVariableLink (VariableNode "$1159996965843-ukZpW48IhJwb9sl7DZMsIN1C") (TypeChoice (TypeNode "ConnectorSeq") (TypeNode "ShapeLink"))) (TypedVariableLink (Variable "r1") (Signature (ChoiceLink (Section (WordNode "his") (TypeNode "ConnectorSeq")) (CrossSection (WordNode "his") (TypeNode "ShapeLink"))))) (TypedVariableLink (Variable "r2") (Signature (ChoiceLink (Section (WordNode "the") (TypeNode "ConnectorSeq")) (CrossSection (WordNode "the") (TypeNode "ShapeLink")))))) (PresentLink (ChoiceLink (Section (WordNode "his") (VariableNode "$1159996965843-ukZpW48IhJwb9sl7DZMsIN1C")) (CrossSection (WordNode "his") (VariableNode "$1159996965843-ukZpW48IhJwb9sl7DZMsIN1C"))) (ChoiceLink (Section (WordNode "the") (VariableNode "$1159996965843-ukZpW48IhJwb9sl7DZMsIN1C")) (CrossSection (WordNode "the") (VariableNode "$1159996965843-ukZpW48IhJwb9sl7DZMsIN1C")))) (ListLink (Variable "r1") (Variable "r2")) )) (define qsa (cog-value->list (cog-execute! qrya))) (define qryc (BindLink (VariableList (TypedVariableLink (VariableNode "$1159996965843-ukZpW48IhJwb9sl7DZMsIN1C") (TypeChoice (TypeNode "ConnectorSeq") (TypeNode "ShapeLink"))) (Variable "r1") (Variable "r2")) (AndLink (PresentLink (ChoiceLink (Section (WordNode "his") (VariableNode "$1159996965843-ukZpW48IhJwb9sl7DZMsIN1C")) (CrossSection (WordNode "his") (VariableNode "$1159996965843-ukZpW48IhJwb9sl7DZMsIN1C"))) (ChoiceLink (Section (WordNode "the") (VariableNode "$1159996965843-ukZpW48IhJwb9sl7DZMsIN1C")) (CrossSection (WordNode "the") (VariableNode "$1159996965843-ukZpW48IhJwb9sl7DZMsIN1C")))) (Identical (Variable "r1") (ChoiceLink (Section (WordNode "his") (VariableNode "$1159996965843-ukZpW48IhJwb9sl7DZMsIN1C")) (CrossSection (WordNode "his") (VariableNode "$1159996965843-ukZpW48IhJwb9sl7DZMsIN1C")))) (Identical (Variable "r2") (ChoiceLink (Section (WordNode "the") (VariableNode "$1159996965843-ukZpW48IhJwb9sl7DZMsIN1C")) (CrossSection (WordNode "the") (VariableNode "$1159996965843-ukZpW48IhJwb9sl7DZMsIN1C"))))) (ListLink (Variable "r1") (Variable "r2")) )) (define qsc (cog-value->list (cog-execute! qryc))) OK, so this needs work. ============================================================== 'pair-count L R - Returns the total observed count on the pair (L,R) L must be an Atom of type 'left-type and likewise for R. 'get-pair L R - Returns the Atom holding the pair (L,R). The returned Atom will be of type 'pair-type. All statistics and information about this pair are attached as Values on this Atom. 'get-count P - Returns the total observed count on the pair P, where P is the Atom returned by 'get-pair. 'make-pair L R - Create the Atom holding the pair, if it does not already exist. 'left-element P - Return the atom on the left of the pair P. 'right-element P - Return the atom on the right of the pair P. These two together undo what 'make-pair creates. ((get-pair) (apply f-get-pair args)) ((make-pair) (apply f-make-pair args)) ((left-element) (apply f-left-element args)) ((right-element) (apply f-right-element args)) ((pair-count) (apply f-pair-count args)) ((get-count) (apply f-get-count args)) ((set-count) (apply f-set-count args)) ((move-count) (apply f-move-count args)) get-pair make-pair get-left-element get-right-element get-all-pairs fetch-all-pairs delete-all-pairs pair-count get-count set-count get-pair get-count make-pair get-left-element get-right-element OK ... It There are very similar. So they lead a clique. The clique is created, but none of the djs that made It There similar are accepted into the clique! So it's proposed again ... 507 a clque is built, (with She) but again, the rejected djs are rejected. a cliqe They He It There 454 again There It again They He It There 369 todo -- wtf Assign: Merged 2206 sections in 0.000 secs Were these merged or just looped over? Just looped over. 0.7*4 = 2.8 -> round -> 3 compute-left-product (prod-obj 'left-count (list COL-A COL-B) prod-obj (add-support-compute (add-fast-math star-obj * GET-CNT))) #:optional (GET-CNT 'get-count) 'left-support 'left-count (define pn (add-support-compute (add-fast-math star-obj *))) (in-group-cluster covr-obj 0.7 200 100) 'left-product 'right-product add-support-compute 'left-length if default count ... GetCount in-group-cluster make-merge-majority assign-to-cluster no (assign-to-cluster LLOBJ CLS WA CLIQUE) (for-each (lambda (PAIR-A) (define cnt (CLIQUE LLOBJ CLS PAIR-A accumulate-count)) (when (< 0 cnt) (monitor-rate #f) ; increment only if counted! (set! accum-cnt (+ accum-cnt cnt)))) (LLOBJ 'right-stars WA)) add group sim (count-shared-conseq LLOBJ QUORUM WORD-LIST) optimal-in-group (sap 'pair-count (Word "There") (Word "He")) EPSI (find-in-group SIMFUN WA WB EPSILON TIGHTNESS CANDIDATES) count-shared-conseq make-merge-majority make-count-monitor ----------------------------------------- In-group size=2 overlap = 2 of 531 disjuncts, commonality= 0.38% In-group size=2: `THE` `(` ------ Assign: Merged 121 of 183 sections on `THE` in 0.0 secs ------ Assign: Merged 237 of 704 sections on `(` in 0.0 secs what's up with that? objdump -d --no-show-raw-insn Files: run-config/2-cogserver/cogserver-pairs-en.conf run-config/2-pair-conf-objdump.sh run-common/file-xform-process.sh run-common/split-objdump.pl run-common/submit-plain.pl HOWTO: cd run/2-word-pairs . ../run-config/0-pipeline.sh . ../run-config/2-pair-conf-objdump.sh ./pair-submit.sh ------ pair-submit.sh calls: ${COMMON_DIR}/process-corpus.sh which calls file-split-process.sh file-nosplit-process.sh file-xform-process.sh replace submit-one.pl by ... submit-plain.pl replace observe-text -> observe-window-24 (observe-text-mode "clique" winsz plain-text) (local-process plain-text observe-mode count-reach) (extract-type 'VariableNode) (load-atoms-of-type 'WordNode) (cog-get-atoms 'WordNode) batch-all-pair-mi in compute-mi.scm: make-central-compute (define ccg (make-central-compute covr-obj)) (ccg 'cache-all) (load "cogserver.scm") (define cset-obj (make-pseudo-cset-api)) (cset-obj 'fetch-pairs) (define star-obj compute-mst-marginals.sh ================================================================ Bug with left-right imbalance (setup-initial-similarities (in-group-cluster covr-obj 0.5 0.2 4 200 1) (print-matrix-summary-report star-obj) Error: left and right total pairs not equal! 74313.0 2746974.0 Error: left and right total counts not equal! 1339120.0 22637313.0 (define sup-obj (add-support-api star-obj)) (lsize (sup-obj 'total-support-left)) (rsize (sup-obj 'total-support-right)) (cog-keys (sup-obj 'wild-wild)) set-left-totals set-right-totals values have to be given (define asc (add-support-compute star-obj)) (asc 'set-left-totals) ; no change after doing this (asc 'set-right-totals) ; no change. above called from recompute-mmt-final ; Recompute marginals after merge. (for-each (lambda (WRD) (recompute-mmt LLOBJ WRD)) in-grp) (recompute-mmt-final LLOBJ) So we looped over the words in the in-group, and since some of thier counts were xfered, of course the remaining support changes. We failed to compute the support on the new word-class. ? (star-obj 'left-basis-size) ; 15084 (define lb (star-obj 'left-basis)) (define wc (car (filter (lambda (wrd) (equal? 'WordClassNode (cog-type wrd))) lb))) (cog-keys wc) ; empy set Yikes! err ... no that's normal ! (star-obj 'right-wildcard wc) (cog-keys (star-obj 'right-wildcard wc)) ; empty set .. yikes! expected: (PredicateNode "*-MM^T Product Key cover-section-*") (PredicateNode "*-Norm Key cover-section") ; from add-support-api from set-right-norms in support-obj from set-right-marginals in support-compute ... (length (star-obj 'right-duals wc)) initially (print-matrix-summary-report star-obj) Error: left and right total pairs not equal! 74313.0 2746974.0 Error: left and right total counts not equal! 1339120.0 22637313.0 after (asc 'set-left-totals) get Error: left and right total pairs not equal! 353652.0 2746974.0 Error: left and right total counts not equal! 5995073.0 22637313.0 after (asc 'set-right-totals) get Error: left and right total pairs not equal! 353652.0 2774209.0 Error: left and right total counts not equal! 5995073.0 22925622.0 So that is better but still not good. What's going wrong here???? before: MM^T support=201433561.0 count=132179898227.0 entropy=18.228 after: MM^T support=201712900.0 count=133473840444.0 entropy=18.218 (define sup-obj (add-support-api star-obj)) OK, so what are the counts? post-merge: (sup-obj 'right-support wc) ; 27235.0 (define mgl (list (WordClassNode "+ — “ ” _") (Word "+") (Word "—") (Word "“") (Word "”") (Word "_"))) (for-each (lambda (w) (format #t "yo ~A sup= ~D cnt= ~D\n" (cog-name w) (sup-obj 'right-support w) (sup-obj 'right-count w))) mgl) (fold (lambda (itm tot) (+ tot (sup-obj 'right-count itm))) 0 mgl) yo + — “ ” _ sup=27235.0 cnt=288309.0 yo + sup=142.0 cnt=5875.0 yo — sup=284.0 cnt=4276.0 yo “ sup=3826.0 cnt=78305.0 yo ” sup=5089.0 cnt=160301.0 yo _ sup=1689.0 cnt=42831.0 fold: support 38265.0 (fold support only of words: 11030.0) count post-merge: 579897.0 pre-merge: yo + sup= 910.0 cnt= 21572.0 yo — sup= 4225.0 cnt= 24464.0 yo “ sup= 13758.0 cnt= 238994.0 yo ” sup= 13498.0 cnt= 217805.0 yo _ sup= 9633.0 cnt= 94084.0 fold: support pre-merge 42024.0 count pre-merge: 596919.0 so ... counts are leaking. More manual checking: (length (star-obj 'right-stars (Word "+"))) ; 910 So obviously, the support before an after will differ. The counts should not. ... but the counts do. (in-group-cluster covr-obj 0.5 0.2 4 200 1) (define merge-majority (make-merge-majority (ACC-FUN LLOBJ (LLOBJ 'make-pair CLUST DJ) SECT 1.0) line 153 of gram-majority.scm (sup-obj 'right-count Even pair-wise: make-merge-pair gram-majority.scm line 153 calls accumulate-count at gram-projective.scm line 306 which is impl at line 210 should use 'move-count ... or not. Two places where count shuffles: 1) the orig merge 2) the connectors merge. -- merge-connectors which calls reshape, line 364 calls reshape-merge which returns "unspecified." ------------------ From scratch by hand. (define cls (WordClassNode "+ — “ ” _")) (define WLIST (list (Word "+") (Word "—") (Word "“") (Word "”") (Word "_"))) ; assign-to-cluster (for-each ... rstars) (define rstars (star-obj 'right-stars (Word "+"))) (define QUORUM 0.5) (define wlen (length WLIST)) (define vote-thresh (if (equal? wlen 2) 2 (inexact->exact (round (* QUORUM wlen))))) (define (vote-to-accept? DJ) (<= vote-thresh (fold (lambda (WRD CNT) (if (nil? (star-obj 'get-pair WRD DJ)) CNT (+ 1 CNT))) 0 WLIST))) ; (clique star-obj cls rstar accumulate-count) (define NOISE 4) (define (clique LLOBJ CLUST SECT ACC-FUN) (define DJ (LLOBJ 'right-element SECT)) (when (or (<= (LLOBJ 'get-count SECT) NOISE) (vote-to-accept? DJ)) (ACC-FUN LLOBJ (LLOBJ 'make-pair CLUST DJ) SECT 1.0) (format #t "Yes, merged into ~A\n" (LLOBJ 'make-pair CLUST DJ)) )) (clique star-obj cls (list-ref rstars 0) accumulate-count) (print-matrix-summary-report star-obj) (recompute-mmt star-obj (Word "+")) (recompute-mmt star-obj (WordClassNode "+ — “ ” _")) (recompute-mmt-final star-obj) (print-matrix-summary-report star-obj) ; Wow. insta-breakage! (define LLOBJ star-obj) (define psu (add-support-compute star-obj)) ; (define sup (add-support-api star-obj)) (define plus-marg (star-obj 'right-wildcard (Word "+"))) (cog-keys plus-marg) (define nkey (PredicateNode "*-Norm Key cover-section")) (cog-value plus-marg nkey) ; (FloatValue 910 21572 3314.388329692222 8610390.908307077) (psu 'set-right-marginals (Word "+")) (cog-value plus-marg nkey) ; (FloatValue 909 21570 3314.387726262575 8602093.322337482) ; yes: one less dj, 2 less count. (define cls-marg (star-obj 'right-wildcard cls)) (cog-value cls-marg nkey) (psu 'set-right-marginals cls) (cog-value cls-marg nkey) ; (FloatValue 1 2 2 2) ; yes: one dj, count of 2 (define ww (star-obj 'wild-wild)) ; Set in the support obj (define rtkey (PredicateNode "*-Right Total Key cover-section")) ; set in the report obj (define rnkey (PredicateNode "*-Right Norm Key cover-section")) (cog-value ww rtkey) ; (FloatValue 2777968 22942644) ; above is number of pairs grand total, and total counts. (cog-value ww rnkey) ; (FloatValue 32439.90696333866 321782.1216717655 8093.262183027697 7966.527548438474) ; above is .. the wacky averges that report reports. (psu 'set-right-totals) ; report is still good. (cog-value ww rtkey) ; still good . ----- -----------------============= ---------------- ========= (define WRD (Word "+")) (define LLOBJ star-obj) (define rduals (LLOBJ 'right-duals WRD)) (length rduals) ; 910 correct (define gms star-obj) (gms 'wild-wild) (gms 'left-wildcard (list-ref rduals 0)) ; as expected (cog-keys (gms 'left-wildcard (list-ref rduals 0))) There are no marginals on this. because ... I guess they were not computed ?? Because (batch-transpose 'mmt-marginals doesn't do it!? Lets see: (setup-supports) (batch-left-support) (batch-right-support) print statement says: Finished left norm marginals in 3232 secs Finished left totals in 71 secs ... Finished right norm marginals in 130 secs Finished right totals in 2 secs so they are computed ... so where are they stored ??? (scomp-obj (add-support-compute star-obj)) (scomp-obj 'all-left-marginals) (store-obj (make-store star-obj)) (store-obj 'store-left-marginals) Crap. (fetch-atom (list-ref rduals 0)) (cog-keys (list-ref rduals 0)) and now we have a key. They were computed and stored but not fetched. Oooops. (covr-obj 'fetch-pairs) delegates to direct-sum which calls fetch on both parts. ... XXX does ot fetch wild wild!??? add-gram-class-api 'fetch-pairs delegates to pseudo-csets.scm which does fetch marginals add-shape-vec-api 'fetch-pairs calls ... (fetch-incoming-set any-left) (fetch-incoming-set any-right) but doesn't do the disjuncts correctly line 504 ; The left-wildcard really should be ; (ListLink any-left R-ATOM) but we've already ; blown too much storage creating atoms, so keep ; it simple, here. (define (get-left-wildcard R-ATOM) R-ATOM) ZZZZZZZZZZZZZZZZZZZZZZZZZZZZ Found it at last. Or maybe not. wtf. (define nkey (PredicateNode "*-Norm Key cover-section")) (cog-value (gms 'left-wildcard (list-ref rduals 4)) nkey) Problems: 1) lots of these have a support of 1, which means they were not trimmed correctly. so wtf. (fetch-atom (list-ref rduals 0)) ; --- OK,, so was not fetched. Again. Why? add-shape-vec-api ; (define cset-obj (make-pseudo-cset-api)) ;already done (define gram-obj (add-gram-class-api cset-obj)) (define stobj (add-pair-stars gram-obj)) (define shape-obj (add-shape-vec-api stobj)) (define shape-stars (add-pair-stars shape-obj)) (define rb (define rb (shape-stars 'right-basis)) length is 838580 (cog-keys (list-ref rb 0)) ... empty set! WTF!? Is it because I have to fetch after explode!? In shapely, llobjid=gram-class In shapely, right basis size= 205003 -- these are the connector seqs only, ---- and thus useless. keys at 42=() (define cover-obj (direct-sum stobj shape-stars)) (define cover-stars (add-pair-stars cover-obj)) did ((make-store LLOBJ) 'store-wildcards) work correctly? yes, it stores (llobj 'right-wildcard x)) looped over left basis, etc. (for-each (lambda (DJ) (psu 'set-left-marginals DJ)) (LLOBJ 'right-duals WRD)) (length (LLOBJ 'right-duals WRD)) ; 910 correct (for-each (lambda (DJ) (psu 'set-left-marginals DJ)) (LLOBJ 'right-duals cls)) (length (LLOBJ 'right-duals cls)) ; 1 right. (psu 'set-left-totals) ; report is now borken. has 3704 total number of pairs. wtf. <<<<<<<< (define ltkey (PredicateNode "*-Left Total Key cover-section")) (cog-value ww ltkey) ; (FloatValue 3704 90475) -- clearly wrong. But how? ; suppo line 600 compute-total-support-from-left line 539 (length (star-obj 'right-basis)) ; 1043583 (define sobj (add-pair-stars star-obj)) (length (sobj 'right-basis)) (define api (add-support-api sobj)) (fold (lambda (item sum) (+ sum (api 'left-support item))) 0 (sobj 'right-basis)) ; 3704.0 wtf .... (define rb (sobj 'right-basis)) (length rb) ; 1043583 -- disjuncts and shapes (api 'left-support (list-ref rb 3)) ; 0 (define dj3 (list-ref rb 3)) ; a conseq (define dj2 (list-ref rb 2)) ; a shape (api 'id) ; cover-section hang on, why is this not the direct sum ? It is (add-covering-sections cset-obj) whcih gives that id. So OK, again. (cog-keys dj) shows as expected. (cog-keys dj2) does not ; perhaps we computed marginals incorectly, to begin with? Heck, yeah, wild-wild is (PredicateNode "*-Direct Sum Wild (gram-class⊕cross-section)") and it should be cover-section. YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY (length (cog-incoming-set (list-ref rb 3))) ; 4 two sects and two margs ; the two margs have no counts on them.... ; one is (AnyNode "cset-word") ; other is (AnyNode "gram-class") (define csm (list-ref (cog-incoming-set (list-ref rb 3)) 3)) (cog-keys csm) ; just one: (PredicateNode "*-Norm Key-*") ah hah, this is not (PredicateNode "*-Norm Key cover-section") (ID (LLOBJ 'id)) (define is-filtered? (and ID (LLOBJ 'filters?))) (api 'filters?) ; #t Who used the wrong key? ????????????????????????????????????????? (for-each (lambda (DJ) (psu 'set-left-marginals DJ)) <<<< ummmm (LLOBJ 'right-duals cls)) (length (LLOBJ 'right-duals cls)) ; 1 right. (cog-keys (car (LLOBJ 'right-duals cls))) ; has a key on it but should not!!!!!!!!!!!!!!!!!!! ; ............... backtracking from below, that is OK. Confusing but OK. ; ................ and the key is correct. So backtrack some more. ; the key is (PredicateNode "*-Norm Key cover-section") (define dj (car (LLOBJ 'right-duals cls))) ; correct the dual is a dj. ; dj is on the right. psu is add-support-compute. set-left-marginals calls api 'set-left-norms api is add-support-api and set-left-norms sets on 'left-wildcard ; but (psu 'left-wildcard dj) is still dj... ; .... which after below is correct. So backtrack to here. XXXXXXXXXXXX (psu 'set-right-marginals WRD) calls calls (psu 'right-wildcard WRD) which is (wrd,*) so that's correct. But (psu 'left-wildcard dj) is junk (missing) and (psu 'right-wildcard dj) is wrong and is not complaining. So -- direct sum is just wrong for left wilds. line 308 (direct-sum stars-obj shape-stars) in shape-vec line 612 ; (define cset-obj (make-pseudo-cset-api)) ;already done (define gram-obj (add-gram-class-api cset-obj)) (define stobj (add-pair-stars gram-obj)) (define shape-obj (add-shape-vec-api stobj)) (define shape-stars (add-pair-stars shape-obj)) (define cover-obj (direct-sum stobj shape-stars)) (define cover-stars (add-pair-stars cover-obj)) (stobj 'left-type) ; choice wor or wordclass (shape-stars 'left-type) ; same, so (define LLA stobj) (define LLB shape-stars) disjoint-left is #f disjoint-right is #t distinct-type is #t (define a-stars LLA) (define in-base? (make-aset-predicate (a-stars 'right-basis))) ; so in-base is a list of all dj's but not shapes. (define (type-a? L-ATOM R-ATOM) (in-base? R-ATOM)) ; so far so good, this is correctly id'ing plain dj's ;So onto left-wildcard ... and shapes reports: ... junk! (LLB 'left-wildcard dj) (shape-stars 'left-wildcard dj) ; junk (shape-obj 'left-wildcard dj) ; junk ; Well, except that is what shape told us to do. Confusing but true. ; So backtrack.... AAAAAAAAAAAAAAAAAAAAAAAAAAA What if one has a fresh remargin? recompute-mmt-final (define asc (add-support-compute star-obj)) (asc 'set-left-totals) (print-matrix-summary-report star-obj) Wham! It's borken! (asc 'set-right-totals) (print-matrix-summary-report star-obj) Does not restore the balance. Error: left and right total pairs not equal! 1922250.0 2777968.0 Error: left and right total counts not equal! 15629513.0 22942644.0 and specifically, its the left totals that are broken. The left total is a loop over disjuncts... (define lb (star-obj 'left-basis)) ; words, 15083 (define rb (star-obj 'right-basis)) ; djs, 1043583 (define nkey (PredicateNode "*-Norm Key cover-section")) (fold (lambda (atm cnt) (if (equal? 'ShapeLink (cog-type atm)) (+ cnt (cog-value-ref (cog-value atm nkey) 0)) cnt)) 0 rb) Above gives 1922250.0 which is ... as expected. (star-obj 'left-wildcard (list-ref rb 0)) (cog-keys (star-obj 'left-wildcard (list-ref rb 3))) Ahh Hah! No marginals on these! They were not loaded! Who loads these? (AnyNode "gram-class") (fetch-incoming-set (AnyNode "gram-class")) Wow. That seems to fix it. Phew. Well, not quite. Almost: Error: left and right total pairs not equal! 2745455.0 2774209.0 Error: left and right total counts not equal! 23046531.0 22925622.0 pre-merge, had Size: 2777968.0 observations: 22942644.0 So ... getting close now. BBBBBBBBBBBBBBBBBBBBBBBBBBBB Rerun one at a time from line 6514 above. (define cls (WordClassNode "+ — “ ” _")) (define (clique LLOBJ CLUST SECT ACC-FUN) (define DJ (LLOBJ 'right-element SECT)) (ACC-FUN LLOBJ (LLOBJ 'make-pair CLUST DJ) SECT 1.0)) (define rstars (star-obj 'right-stars (Word "+"))) (length rstars) ; 910 (define sex (filter (lambda (sec) (equal? 'Section (cog-type sec))) rstars)) (define xes (filter (lambda (sec) (equal? 'CrossSection (cog-type sec))) rstars)) (length sex) ; 66 (length xes) ; 844 ... wow (define sexy (list-ref sex 0)) (cog-count sexy) ; 11 (star-obj 'left-element sexy) (star-obj 'right-element sexy) (star-obj 'right-wildcard (star-obj 'left-element sexy)) (define sup (add-support-api star-obj)) (sup 'right-support (star-obj 'left-element sexy)) ; 910 (sup 'right-count (star-obj 'left-element sexy)) ; 21572 (sup 'left-support (star-obj 'right-element sexy)) ; 31 (sup 'left-count (star-obj 'right-element sexy)) ; 2345 ; ------ ; Merge (clique star-obj cls sexy accumulate-count) (cog-count sexy) ; 0 ; ---- ; Recompute supports (define psu (add-support-compute star-obj)) (psu 'set-right-marginals (star-obj 'left-element sexy)) (sup 'right-support (star-obj 'left-element sexy)) ; 909 OK (sup 'right-count (star-obj 'left-element sexy)) ; 21561 is 11 less OK (psu 'set-right-marginals cls) (sup 'right-support cls) ; 1 OK (sup 'right-count cls) ;; 11 OK. ; ---- (psu 'set-left-marginals (star-obj 'right-element sexy)) (sup 'left-support (star-obj 'right-element sexy)) ; 31 no change OK, just moved (sup 'left-count (star-obj 'right-element sexy)) ; 2345 no change OK. ; ---- (sup 'total-support-left) ; 2777968.0 (sup 'total-support-right) ; 2777968.0 (sup 'total-count-left) ; 22942644.0 (sup 'total-count-right) ; 22942644.0 (psu 'set-left-totals) (sup 'total-support-left) ; 2777968.0 no change (sup 'total-count-left) ; 22942644.0 no change (psu 'set-right-totals) (sup 'total-support-right) ; 2777968.0 no change (sup 'total-count-right) ; 22942644.0 no change ; So .. that worked. ; ---- ; Verify bulk recompute is OK (for-each (lambda (DJ) (psu 'set-left-marginals DJ)) (star-obj 'right-duals cls)) (for-each (lambda (DJ) (psu 'set-left-marginals DJ)) (star-obj 'right-duals (WordNode "+"))) (psu 'set-left-totals) (psu 'set-right-totals) ; Still good ; ------- ; Verify merging all sections is not a problem. ; (define sex (filter (lambda (sec) (equal? 'Section (cog-type sec))) rstars)) (for-each (lambda (sexy) (clique star-obj cls sexy accumulate-count)) sex) (psu 'set-right-marginals (WordNode "+")) (psu 'set-right-marginals cls) (for-each (lambda (DJ) (psu 'set-left-marginals DJ)) (star-obj 'right-duals cls)) (for-each (lambda (DJ) (psu 'set-left-marginals DJ)) (star-obj 'right-duals (WordNode "+"))) (psu 'set-left-totals) (print-matrix-summary-report star-obj) ; Still good (psu 'set-right-totals) (print-matrix-summary-report star-obj) ; <<<<<<< bad!! (sup 'total-support-left) ; 2777968.0 no change (sup 'total-count-left) ; 22942644.0 no change (sup 'total-support-right) ; 2777903.0 << shrank by 65 (one less than sections (sup 'total-count-right) ; 22938337.0 << shrank by 4307.0 (sup 'right-count cls) ; 11 WTF <<<<<<<<<<<< (sup 'right-count (WordNode "+")) ; 17254.0 Maybe be seems wrong. (fold (lambda (sexy cnt) (+ cnt (cog-count (star-obj 'get-pair cls (star-obj 'right-element sexy))))) 0 sex) ; 4318 ; 4307.0 = 4318 - 11 which is what's mssing on cls ... why? (define cls-mrg (psu 'set-right-marginals cls)) (cog-keys cls-mrg) (define pkey (PredicateNode "*-Norm Key cover-section")) (cog-value cls-mrg pkey) ; <<<<<<<<<< wrong ; XXX So (psu 'set-right-marginals cls) is borken ... why? (psu 'clobber) (psu 'set-right-marginals cls) (sup 'right-count cls) ; 4318.0 Yay!!! (psu 'set-right-totals) (print-matrix-summary-report star-obj) <<<<<<< Yay!!! So the fix is ... be sure to clobber! Really? (define (store-mmt WRD) (recompute-mmt LLOBJ WRD)) make-merge-pair Nope. That still doesn't fix it. WTF. Is it the cross sections? CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC (in-group-cluster covr-obj 0.5 0.2 4 200 1) (define psu (add-support-compute star-obj)) (define sup (add-support-api star-obj)) (sup 'total-support-left) ; 2745455.0 got smaller from 2777968.0 (sup 'total-count-left) ; 23046531.0 got larger from 22942644.0 (sup 'total-support-right) ; 2774209.0 smaller from 2777968.0 (sup 'total-count-right) ; 22925622.0 smaller from 22942644.0 no change (define (recomp WRD) (psu 'set-right-marginals WRD) (for-each (lambda (DJ) (psu 'set-left-marginals DJ)) (star-obj 'right-duals WRD))) (define WLIST (list (Word "+") (Word "—") (Word "“") (Word "”") (Word "_"))) (define cls (WordClassNode "+ — “ ” _")) (for-each recomp WLIST) (recomp cls) (psu 'set-left-totals) (psu 'set-right-totals) ; No change from above. (psu 'clobber) and do it again, (sup 'total-support-left) ; 353652.0 huge change (sup 'total-count-left) ; 5995073.0 huge change (sup 'total-support-right) ; 38265.0 (sup 'total-count-right) ; 579897.0 huge change... wtf (sup 'clobber) ; wipes out everything! OK, so that was a bad idea... DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD (in-group-cluster covr-obj 0.5 0.2 4 200 1) (print-matrix-summary-report star-obj) (star-obj 'clobber) (define psu (add-support-compute star-obj)) (define sup (add-support-api star-obj)) (define WLIST (list (Word "+") (Word "—") (Word "“") (Word "”") (Word "_"))) (define cls (WordClassNode "+ — “ ” _")) (define (recomp WRD) (psu 'set-right-marginals WRD) (for-each (lambda (DJ) (psu 'set-left-marginals DJ)) (star-obj 'right-duals WRD))) (for-each recomp WLIST) (recomp cls) (psu 'set-left-totals) (psu 'set-right-totals) (print-matrix-summary-report star-obj) EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE ; Initial count: 22942644.0 ; merge everything now... (define mrg (make-merge-majority star-obj 0.5 4 #f)) (define WLIST (list (Word "+") (Word "—") (Word "“") (Word "”") (Word "_"))) (define cls (mrg WLIST)) ; (define cls (WordClassNode "+ — “ ” _")) (star-obj 'clobber) (for-each (lambda (WRD) (recompute-mmt star-obj WRD)) WLIST) (recompute-mmt star-obj cls) (recompute-mmt-final star-obj) (print-matrix-summary-report star-obj) ; All good. count: 22942644.0 unchanged. Good ;merge-connectors in gram-projective.scm (define wa (Word "+")) (define (clik LLOBJ CLUST SECT ACC-FUN) (define DJ (LLOBJ 'right-element SECT)) (ACC-FUN LLOBJ (LLOBJ 'make-pair CLUST DJ) SECT 1.0)) ; (merge-connectors star-obj cls wa clik) (define (reshape OBJ MRGECT SECT FRAC) (reshape-merge OBJ cls MRGECT wa SECT FRAC) ) (define rstars (star-obj 'right-stars wa)) (define pra (list-ref rstars 0)) (clik star-obj cls pra reshape) (star-obj 'clobber) (recompute-mmt star-obj pra) (recompute-mmt star-obj cls) (recompute-mmt-final star-obj) ; XXX above merged connectors on just one x-section with zero count. Total counts are balanced, but count is 22942648.0 so it's larger by 4 XXX (define pra (list-ref rstars 1)) ; xsect with cnt of 63 (print-matrix-summary-report star-obj) ; Counts still balanced, but totla count is 22942648.0 so unchanged (define (mgcon pra) (clik star-obj cls pra reshape) (star-obj 'clobber) (recompute-mmt star-obj pra) (recompute-mmt star-obj cls) (recompute-mmt-final star-obj)) (define sex (filter (lambda (sec) (equal? 'Section (cog-type sec))) rstars)) (length sex) ; 66 (define sexy (list-ref sex 0)) (mgcon sexy) ; Now it's borken. Error: left and right total pairs not equal! 2774930.0 2774929.0 Error: left and right total counts not equal! 22942659.0 22942648.0 2774930.0 2774929.0 -- off by 1 22942659.0 22942648.0 -- off by 11, the count of the section Hmm 22942659.0 = sum of counts over all djs, right? too large by 11 22942648.0 = sum of counts over all words... right? (define all-elts (star-obj 'get-all-elts)) (fold (lambda (atm cnt) (+ cnt (cog-count atm))) 0 all-elts) 22942659.0 so that's the larger number. and its too big (define asa (add-support-api star-obj)) (define rall (star-obj 'right-basis)) ;;; this is all dj's (fold (lambda (atm cnt) (+ cnt (asa 'left-count atm))) 0 rall) 22942659.0 s that's .. the larger number ... (define lall (star-obj 'left-basis)) ;; this is all words. (fold (lambda (atm cnt) (+ cnt (asa 'right-count atm))) 0 lall) 22942648.0 ... that's the orig number (length lall) ; 15084 so one more than before... FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF ; Initial count: 22942644.0 ; merge everything now... (define mrg (make-merge-majority star-obj 0.5 4 #f)) (define WLIST (list (Word "+") (Word "—") (Word "“") (Word "”") (Word "_"))) (define cls (mrg WLIST)) ; (define cls (WordClassNode "+ — “ ” _")) (star-obj 'clobber) (for-each (lambda (WRD) (recompute-mmt star-obj WRD)) WLIST) (recompute-mmt star-obj cls) (recompute-mmt-final star-obj) (print-matrix-summary-report star-obj) ; All good. count: 22942644.0 unchanged. Good ; manual merge-connectors in gram-projective.scm (define wa (Word "+")) (define rstars (star-obj 'right-stars wa)) (define pra (list-ref rstars 0)) ; pra is a x-section in the form ; [a, ] unit tested. ; It has a count of zero, because `a` has been merged. ; ; In clique, do this: (define dj (star-obj 'right-element pra)) ; its a shape link ; (define cls (WordClassNode "+ — “ ” _")) (define msec (star-obj 'make-pair cls dj)) ; msec already has count of 2.0 from the plain merge. ; manual (reshape-merge star-obj cls msec wa pra 1.0) (define donor-type (cog-type pra)) ; So CrossSec ; manual (merge-resects star-obj cls wa msec pra) ; line 187 ff shape-project.scm (define resect (star-obj 'make-section msec)) ; moves cls into connector ; resect has count of 0; its garbage because its partial. ; resect is (c, ka{g}am) ; the original is (c, kaaam) which still has a count of 2 on it. ; (define origd .... is donor below. (define germ (star-obj 'left-element resect)) ; (WordNode "=Lit") (define mgsf (star-obj 'flatten cls resect)) ; cls in all connectors. ; mgsf is (c, k{g}{g}{g}m) which is right, and it has no counts on it (define donor (star-obj 'make-section pra)) ; donor is (c, kaaam) and has count of 2 exactly as expected. (define d-cnt 2) donors count (define x-cnt 2) xsect count; this is msec, and already has count of 2. ; pra is [a, ] has count of 0 ; [{g}, ] has count of 2 ; [{g}, ] has count of 2 ; [{g}, ] has count of 4 ... whoa (rebalance-count star-obj resect 0) ; zeros count on sect (its already zero) ; (star-obj 'get-cross-sections resect) ; and also zeros count on one xsections. ; XXX??? resect has only one existing xsect, which is the original xsect. ; None of the others exist. So above reduces count by 4 grand total. (star-obj 'make-cross-sections mgsf) ; line 203 ; the above makes 5 xsects total (there were 5 connectors) (rebalance-count star-obj mgsf 2) ; puts 2 on the above 5 plus on mgsf ; so above increases total count by 12. ; so far that is 12 - 4 = count increase of +8 (rebalance-count star-obj donor 2) ; donor already has count of 2. ; (star-obj 'get-cross-sections donor) ; above shows 5 xsections. two have counts The other three don't .. why not? ; so above adds +6 tot total count. ; XXX but this is wrong, the count on the donor should be zeroed. ; so wtf !?? ;;;;;; (star-obj 'clobber) (recompute-mmt star-obj pra) (recompute-mmt star-obj cls) (recompute-mmt-final star-obj) ; XXX above merged connectors on just one x-section with zero count. Total counts are balanced, but count is 22942648.0 so it's larger by 4 XXX GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG Unit test is now bonking. ; gram-pairwise.scm line 305 ; calls start-cluster ; calls merge-connectors line 110 with clik: (define (clique LLOBJ CLUST SECT ACC-FUN) (define WRD (LLOBJ 'left-element SECT)) (define DJ (LLOBJ 'right-element SECT)) (define WOTHER (if (equal? WRD WA) WB WA)) (define OTHSEC (LLOBJ 'get-pair WOTHER DJ)) (if (nil? OTHSEC) (if (<= (LLOBJ 'get-count SECT) NOISE) (ACC-FUN LLOBJ (LLOBJ 'make-pair CLUST DJ) SECT 1.0) (if (< 0 frac-to-merge) (ACC-FUN LLOBJ (LLOBJ 'make-pair CLUST DJ) SECT frac-to-merge))) (ACC-FUN LLOBJ (LLOBJ 'make-pair CLUST DJ) SECT 1.0) ) ) ; manual merge-connectors in gram-projective.scm line 333 (define wa (Word "a")) (define rstars (gsc 'right-stars wa)) ; there are 6 (define pra (list-ref rstars 0)) (define cls (WordClass "a b")) ; pra is count of 0 ; calls clique: line 370 (define dj (gsc 'right-element pra)) ; [c, kavbm] (define wot (Word "b")) (define osc (gsc 'get-pair wot dj)) ; nil ; (ACC-FUN gsc (gsc 'make-pair cls dj) pra 1.0) ; (reshape-merge gsc cls (gsc 'make-pair cls dj) wa pra 1.0) ; (reshape-merge LLOBJ GLS MRG W DONOR FRAC) ; reshape-merge in shape-project.scm line 221 ; calls merge-resects line 283 ; (merge-resects gsc cls wa (gsc 'make-pair cls dj) pra) ; (merge-resects LLOBJ GLS W XMR XDON) (define xmr (gsc 'make-pair cls dj)) ; xmr is <{ab}, [c, kavbm]> count of 23 (define resect (gsc 'make-section xmr)) ; resect is junky (c,ka{ab}bm) with no count (define mgsf (gsc 'flatten cls resect)) ; mgsf is nice with no count. ; next: line 196 (define donor (gsc 'make-section pra)) ; donor is (c, kaabm) with count 23 looks good (define d-cnt 0) (define x-cnt 23) ; set resect count to 0 OK was none (gsc 'make-cross-sections mgsf) ; cool. None have counts yet. ; (rebalance-count gsc mgsf x-cnt) ; sets count on mgsf and all crosses. Looks AOK (define (set-count ATOM CNT) (cog-set-tv! ATOM (CountTruthValue 1 0 CNT))) (define (rebalance-count LLOBJ SECTION CNT) (set-count SECTION CNT) (for-each (lambda (XST) (set-count XST CNT)) (LLOBJ 'get-cross-sections SECTION))) (rebalance-count gsc mgsf x-cnt) (rebalance-count gsc donor d-cnt) ; sets count on (c, kaabm) to zero. ; and also counts on all xsects from it. ; ------- Done. Move on to the next one. (define pra (list-ref rstars 1)) ; count of 0 ; call clique at line 370 of gram-projective.scm ; clique is line 92 of gram-pairwise.scm (define dj (gsc 'right-element pra)) ; [c, kaavm] (define wot (Word "b")) (define osc (gsc 'get-pair wot dj)) ; not nil, count of zero. ; not nil because came from (c, kaabm) "other" ; Then this call ; (ACC-FUN gsc (gsc 'make-pair cls dj) pra 1.0) ; This is line 222 of shape-project.scm vi line 363 of gram-projective.scm ; (reshape-merge gsc cls (gsc 'make-pair cls dj) wa pra 1.0) ; (reshape-merge LLOBJ GLS MRG W DONOR FRAC) ; (merge-resects gsc cls W MRG=make-pair DONOR=pra) ; (merge-resects LLOBJ GLS W XMR XDON) (define xmr (gsc 'make-pair cls dj)) ; <{ab}, [c, kaavm]> count 67 (define resect (gsc 'make-section xmr)) ; (c, kaa{ab}m) null count junk (define mgsf (gsc 'flatten cls resect)) ; (c, k{ab}{ab}{ab}m) count 23 (define donor (gsc 'make-section pra)) ; (c, kaaam) count 44 (define d-cnt 0) (define x-cnt 67) (rebalance-count gsc mgsf x-cnt) (rebalance-count gsc donor d-cnt) ; kills (c, kaaam) to zero yes. ; ------- Done. Move on to the next one. (define pra (list-ref rstars 2)) ; (a, gh) count 0 (define dj (gsc 'right-element pra)) (define wot (Word "b")) (define osc (gsc 'get-pair wot dj)) ; (b, gh) not nil, count of zero. ; reshape-merge gsc cls (gsc 'make-pair cls dj) wa pra 1.0) ; (reshape-merge LLOBJ GLS MRG W DONOR FRAC) (define mrg (gsc 'make-pair cls dj)) ; ({ab}, gh) count 99 (define flat (gsc 'flatten cls mrg)) ; #f line 247 (define donor pra) (gsc 'make-cross-sections pra) ; both had count of 61 (for-each (lambda (XST) (define xmr (gsc 're-cross cls XST)) ; (accumulate-count gsc xmr XST 1)) ; sets these to 61 (gsc 'make-cross-sections pra)) ; above looks good (gsc 'make-cross-sections pra) ; both now zeroed. etc. (gsc 'is-nonflat? cls mrg) #f line 271 (rebalance-count gsc mrg (gsc 'get-count mrg)) ; set to 99, over-riding earlier work! ; ------- Done. Move on to the next one. (define pra (list-ref rstars 3)) ; count 0 (define dj (gsc 'right-element pra)) (define wot (Word "b")) (gsc 'get-pair wot dj) ; its nil (define xmr (gsc 'make-pair cls dj)) ; <{ab}, [c,kvaam]> count 44 (define resect (gsc 'make-section xmr)) ; junk (c, k{ab}aam) (define mgsf (gsc 'flatten cls resect)) (define donor (gsc 'make-section pra)) (c, kaaam) (define d-cnt 0) (define x-cnt 44) (rebalance-count gsc mgsf 44) ; whoa ... should this accumulate? XXX Bad (rebalance-count gsc donor 0) ; ------- Done. Move on to the next one. (define pra (list-ref rstars 4)) ; count 0 (define dj (gsc 'right-element pra)) (define xmr (gsc 'make-pair cls dj)) ; <{ab}, [c,kvabm]> count 23 (define resect (gsc 'make-section xmr)) (define mgsf (gsc 'flatten cls resect)) ; now has 44 on it from above. (define d-cnt 0) (define x-cnt 23) (define donor (gsc 'make-section pra)) ; (c, kaabm) (rebalance-count gsc mgsf 23) ; even worse... XXX (rebalance-count gsc donor 0) ; ------- Done. Move on to the next one. (define pra (list-ref rstars 5)) ; count 0 (define dj (gsc 'right-element pra)) (define xmr (gsc 'make-pair cls dj)) ; <{ab}, [c, kavam]> 44 (define resect (gsc 'make-section xmr)) (define mgsf (gsc 'flatten cls resect)) ; now has 23 from above (define d-cnt 0) (define x-cnt 44) (define donor (gsc 'make-section pra)) ; (c, kaaam) (rebalance-count gsc mgsf 44) (rebalance-count gsc donor 0) so 0 was <{ab}, [c, kavbm]> 23 from (c, kaabm) 23 -so mgsf==23 1 was <{ab}, [c, kaavm]> 67 from (c, kaa*m) -so mgsf==67=44+23 2 skip 3 was <{ab}, [c, kvaam]> 44 from (c, kaaam) so should not drop Bummer 4 was <{ab}, [c, kvabm]> 23 from (c, kaabm) 23 -so droped even lower. piss. 5 was <{ab}, [c, kavam]> 44 from (c, kaaam) prt-section prt-cross-section prt-element expected-f-double-e (f, {e j}- b- {e j}+) duuude xmr= 8.500 * [{e j}, ] duuude donor=34.000 * (f, e- b- e+) duuude mgs=34.000 * (f, {e j}- b- {e j}+) connector-merge-tricon-dbl.scm:173: FAIL connector-merge-tricon-dbl.scm:175: FAIL connector-merge-tricon-dbl.scm:178: FAIL connector-merge-tricon-dbl.scm:180: FAIL connector-merge-tricon-dbl.scm:188: FAIL connector-merge-tricon-dbl.scm:265: FAIL connector-merge-tricon-dbl.scm:271: FAIL connector-merge-tricon-dbl.scm:283: FAIL connector-merge-tricon-dbl.scm:285: FAIL connector-merge-tricon-dbl.scm:295: FAIL tricon-dbl: just one donor mgs= 8.500 * (f, {e j}- b- {e j}+) donor=34.000 * (f, e- b- e+) xmr= 8.500 * [{e j}, ] xmr= 8.500 * [{e j}, ] xdon=25.500 * [e, ] xdon=25.500 * [e, ] balance: two distinct donors contrbute mgs=67.000 * (c, k- {b a}+ {b a}+ {b a}+ m+) xmr=44.000 * [{a b}, ] donor=44.000 * (c, k- a+ a+ a+ m+) xmr=23.000 * [{a b}, ] donor=23.000 * (c, k- a+ a+ b+ m+) xmr=44.000 * [{a b}, ] donor=44.000 * (c, k- a+ a+ a+ m+) xmr=23.000 * [{a b}, ] donor=23.000 * (c, k- a+ a+ b+ m+) xmr=67.000 * [{a b}, ] donor=44.000 * (c, k- a+ a+ a+ m+) donor=23.000 * (c, k- a+ a+ b+ m+) algo: take max xmr of all of these. But this won't work in general :-( tridonor mgs=97.000 * (c, k- {b a}+ {b a}+ {b a}+ m+) xmr=44.000 * [{a b}, ] donor=44.000 * (c, k- a+ a+ a+ m+) xmr=23.000 * [{a b}, ] donor=23.000 * (c, k- a+ a+ b+ m+) xmr=30.000 * [{a b}, ] donor=30.000 * (c, k- a+ b+ a+ m+) xmr=74.000 * [{a b}, ] donor=44.000 * (c, k- a+ a+ a+ m+) donor=30.000 * (c, k- a+ b+ a+ m+) xmr=23.000 * [{a b}, ] donor=23.000 * (c, k- a+ a+ b+ m+) xmr=67.000 * [{a b}, ] donor=44.000 * (c, k- a+ a+ a+ m+) donor=23.000 * (c, k- a+ a+ b+ m+) xmr=30.000 * [{a b}, ] donor=30.000 * (c, k- a+ b+ a+ m+) flatten reshape-merge (when (equal? 'Section donor-type) (let ((flat (LLOBJ 'flatten GLS MRG))) (if flat (define resect (LLOBJ 'make-section XMR)) (define germ (LLOBJ 'left-element resect)) (define mgsf (LLOBJ 'flatten GLS resect)) merge-connectors instead in assign-to-cluster assign-to-cluster loops can calls CLIQUE passing accumulate-count clique votes and based on vote, moves count to (LLOBJ 'make-pair CLUST DJ) replace this by flatten... (define cnt-c-aaa 44) (define cnt-c-aab 23) (define cnt-c-aba 30) ================== leak of meet-links: (run-query f-right-star-pat itm f-right-star-var) Need instrumentation line 483 of object-api.scm that's not it... wtf... (define mls (cog-get-atoms 'MeetLink)) Oh, they are the dot-poduct mets. in fold-api FIXED ------------------ word-inst try-count-one-word xxx (define (try-count-one-word word-inst) (catch 'wrong-type-arg (lambda () (count-one-atom (word-inst-get-word word-inst))) (lambda (key . args) #f))) FIXED ---------- connector-merge-conext.scm We expect only one cross-section: [e, ] (setup-e-j-sections) (setup-j-extra) (setup-e-extra) (MemberLink WA CLS) ; (e, abc) + (j, abc) -> ({ej}, abc) ; (e, dgh) + (j, dgh) -> ({ej}, dgh) ; (e, klm) + none -> p * ({ej}, klm) + (1-p) * (e, klm) ---------- conind.scm -bad vs +good: - 5.250 * [{e j}, <{e j}, a- b- $+>] + 5.250 * [{e j}, <{e j}, a- b- $+>] +final xsec= 4.250 * [{e j}, <{e j}, $- g- h+>] -44.000 * [b, <{e j}, a- $- c+>] +44.000 * [b, <{e j}, a- $- c+>] -44.000 * [c, <{e j}, a- b- $+>] +44.000 * [c, <{e j}, a- b- $+>] - 5.750 * [k, ] - 4.250 * [k, ] - 5.750 * [l, ] - 4.250 * [l, ] from 5.750 * (a, k- l- {e j}+) 4.250 * (f, k- l- {e j}+) +12.750 * [k, ] from 12.750 * (f, k- l- e+) so OK... ---------- conext.scm description is misleading pre merge: 11.000 * (e, a- b- e+) 21.000 * (j, a- b- e+) 9.000 * (e, a- b- j+) 61.000 * (e, d- g- h+) 16.000 * (j, d- g- h+) 13.000 * (e, a- b- c+) 44.000 * (e, k- l+ m+) 31.000 * (j, a- b- c+) 17.000 * (j, e- g- h+) 6.750 * (e, a- b- j+) 33.000 * (e, k- l+ m+) 15.750 * [e, ] ??? new, from (j, a- b- e+) 12.750 * [e, ] his cross-section corresponds to (1-p) * (j, egh) duude post-merge sects:9 34.250 * ({e j}, a- b- {e j}+) == 32 + 2.25 = 32 + 1/4 or 9 OK 6.750 * (e, a- b- j+) == 3/4 of 9 so OK 15.750 * (j, a- b- e+) ... ?? should be gone ... (rebalance-count LLOBJ resect x-cnt) 44.000 * ({e j}, a- b- c+) 77.000 * ({e j}, d- g- h+) 11.000 * ({e j}, k- l+ m+) 4.250 * ({e j}, {e j}- g- h+) 33.000 * (e, k- l+ m+) 12.750 * (j, e- g- h+) ---------- -21.000 * [k, <{e j}, $- l+ {e j}+>] ???? -21.000 * [l, <{e j}, k- $+ {e j}+>] ???? 13.000 * ({e j}, a- b- c+) 44.000 * ({e j}, k- l+ m+) 21.000 * ({e j}, k- l+ {r s}+) 25.000 * ({r s}, a- b- c+) 61.000 * ({r s}, d- g- h+) 16.000 * ({r s}, a- b- {e j}+) make-flat ... ITEM_CLASS_NODE (cog-subtype? 'ItemClass CLS) merge-connectors LLOBJ CLA CLB clique) ------------------- pre F2 merge: 15.750 * (j, a- b- e+) 12.750 * (j, e- g- h+) 17.250 * (a, k- l- e+) 5.750 * (a, k- l- {e j}+) 33.000 * (e, k- l+ m+) 19.000 * (f, a- b- c+) + 44.000 * ({e j}, a- b- c+) 7.750 * (f, a- b- {e j}+) + 5.250 * ({e j}, a- b- {e j}+) 36.000 * (f, d- g- h+) + 77.000 * ({e j}, d- g- h+) 34.000 * (f, k- l+ m+) + 11.000 * ({e j}, k- l+ m+) 6.250 * (f, {e j}- g- h+) + 4.250 * ({e j}, {e j}- g- h+) 18.750 * (f, e- g- h+) --- 0.35 into above... 23.250 * (f, a- b- e+) --- 0.35 into above ... 4.250 * (f, k- l- {e j}+) --- 0.35 into ({e j}, k- l- {e j}+) below 12.750 * (f, k- l- e+) --- 0.35 into ({e j}, k- l- {e j}+) below post F2 sections=15 15.750 * (j, a- b- e+) NC 12.750 * (j, e- g- h+) NC 17.250 * (a, k- l- e+) NC 5.750 * (a, k- l- {e j}+) NC 33.000 * (e, k- l+ m+) NC 63.000 * ({e j}, a- b- c+) Yep as in master 21.138 * ({e j}, a- b- {e j}+) ... 7.75 + 5.25 + 0.35*23.25 = yep! as in master 113.000 * ({e j}, d- g- h+) Yep as in master 45.000 * ({e j}, k- l+ m+) Yep as in master 17.063 * ({e j}, {e j}- g- h+) ... 6.25 + 4.25 + 0.35*18.750 = 17.0625 as in master 5.950 * ({e j}, k- l- {e j}+) ... 0.35 * 4.250 + 0.35 * 12.750 ... master shows 8.712 15.113 * (f, a- b- e+) = 0.65 of orig = (* 0.65 23.25) as in master 12.188 * (f, e- g- h+) = 0.65 or orig. OK as in master 8.288 * (f, k- l- e+) as in master 2.763 * (f, k- l- {e j}+) new ... XXX and wrong XXX should be folded in is-nonflat? f stars=4 15.113 * (f, a- b- e+) = 0.65 or orig. OK 12.188 * (f, e- g- h+) = 0.65 or orig. OK 8.288 * (f, k- l- e+) OK 2.763 * (f, k- l- {e j}+) OK ej cross=5 17.063 * [{e j}, <{e j}, $- g- h+>] 2.763 * [{e j}, ] 5.750 * [{e j}, ] 21.138 * [{e j}, <{e j}, a- b- $+>] 5.950 * [{e j}, <{e j}, k- l- $+>] accumulate-count called from clique assign-to-cluster called w/ cliq merge-connectors LLOBJ 'provides 'get-cross-sections ------------ 22942644.0 2777968.0 23109300.0 Error: left and right total pairs not equal! 2750827.0 2774339.0 Error: left and right total counts not equal! 23109300.0 22930795.0 Crap. HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH Restart where we left off. Do one whole word. Size: 2777968.0 Total observations: 22942644.0 (define cls (WordClassNode "+ — “ ” _")) (define WLIST (list (Word "+") (Word "—") (Word "“") (Word "”") (Word "_"))) (for-each (lambda (wrd) (Member wrd cls)) WLIST) (define wa (Word "+")) (define (clik CLUST SECT ACC-FUN) (define fla (star-obj 'make-flat CLUST SECT)) (format #t "Clik: ~A to ~A\n" (prt-element SECT) (prt-element fla)) (ACC-FUN star-obj fla SECT 1.0)) (assign-to-cluster star-obj cls wa clik) (rebalance-shapes star-obj cls wa clik) (star-obj 'clobber) (recompute-mmt star-obj (list wa cls)) (recompute-mmt-final star-obj) (print-matrix-summary-report star-obj) Size: 2777968.0 Total observations: 22942644.0 Wow! OK, so that worked! Try again. XXX No it didn't. It works only if one memberLink is created. If many are created, its borken. Error: left and right total pairs not equal! 2778046.0 2778105.0 Error: left and right total counts not equal! 22940912.0 22942086.0 after casting a bigger net: Error: left and right total pairs not equal! 2777952.0 2777629.0 Error: left and right total counts not equal! 22936023.0 22928998.0 Maybe this is needed: (for-each (lambda (wrd) (recompute-mmt star-obj wrd)) WLIST) (recompute-mmt-final star-obj) (print-matrix-summary-report star-obj) Nope, has no effect. (define wa (Word "—")) Error: left and right total pairs not equal! 2777699.0 2777725.0 Error: left and right total counts not equal! 22943914.0 22941314.0 --------- (define sup-obj (add-support-api star-obj)) (define cnt 0) (define (assn pra) (clik cls pra accumulate-count) (clik cls pra rebalance-merge) (define then (current-time)) (star-obj 'clobber) (recompute-mmt star-obj (list wa cls)) (format #t "did mmt in ~A secs\n" (- (current-time) then)) (set! then (current-time)) (recompute-mmt-final star-obj) (format #t "did mmt final ~A secs\n" (- (current-time) then)) (print-matrix-summary-report star-obj) (set! cnt (+ 1 cnt)) (format #t "Done ~A ~A\n" cnt (prt-element pra)) (define lobs (sup-obj 'total-count-left)) (define robs (sup-obj 'total-count-right)) (define nlobs (inexact->exact (round lobs))) (define nrobs (inexact->exact (round robs))) (when (not (equal? nlobs nrobs)) (format #t "Bonk: left and right total counts! ~A ~A\n" lobs robs) (foobar)) (when (not (equal? nlobs 22942644)) (format #t "Bonk: 22942644 counts changed ! ~A ~A\n" lobs robs) (foobar)) ) (define rstars (star-obj 'right-stars wa)) (define pra (list-ref rstars 0)) (for-each assn rstars) very first one gets count wrong. ... each pair has crosses. need to recompute marginals for ... each lefty in each cross. but only that righty in each cross. get-section get-cross-sections --------- first four: 2.000 * [+, <=Lit, ###LEFT-WALL###- +- $- +- .+>] 63.000 * [+, <+, $- —+ =N+>] 10.000 * [+, <+, .- $+ =Lond+ .+>] 2.000 * [+, <*, .- $+ =Dial.=+ 389+>] 2.000 * [{+ — “ ” _}, <=Lit, ###LEFT-WALL###- {+ — “ ” _}- $- {+ — “ ” _}- .+>] 63.000 * [{+ — “ ” _}, <{+ — “ ” _}, $- {+ — “ ” _}+ =N+>] 10.000 * [{+ — “ ” _}, <{+ — “ ” _}, .- $+ =Lond+ .+>] 2.000 * [{+ — “ ” _}, <*, .- $+ =Dial.=+ 389+>] fifth one bombs: 4.000 * [+, <246, .- ”- $- =Outlook.=- 79- :- .+>] 4.000 * [{+ — “ ” _}, <246, .- {+ — “ ” _}- $- =Outlook.=- 79- :- .+>] Error: left and right total counts not equal! 22942648.0 22942644.0 JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ (define cls (WordClassNode "+ — “ ” _")) (define WLIST (list (Word "+") (Word "—") (Word "“") (Word "”") (Word "_"))) (for-each (lambda (wrd) (Member wrd cls)) WLIST) (define wa (Word "+")) (define (clik CLUST SECT ACC-FUN) (define fla (star-obj 'make-flat CLUST SECT)) (format #t "Clik: ~A to ~A\n" (prt-element SECT) (prt-element fla)) (ACC-FUN star-obj fla SECT 1.0)) (define-public (prt-dj-list LST) (string-concatenate (map (lambda (ELT) (format #f "~A\n" (prt-dj ELT))) LST))) (define sup-obj (add-support-api star-obj)) (define pcnt 0) (define (assn pra) (clik cls pra accumulate-count) (clik cls pra rebalance-merge) (set! pcnt (+ 1 pcnt)) (format #t "Done ~A ~A\n" pcnt (prt-element pra)) (define fla (star-obj 'make-flat cls pra)) (define dj-set (make-atom-set)) (define wrd-set (make-atom-set)) (define pear-set (make-atom-set)) (wrd-set cls) (wrd-set (star-obj 'left-element pra)) (define (pair-margins PAIR) (pear-set PAIR) (wrd-set (star-obj 'left-element PAIR)) (dj-set (star-obj 'right-element PAIR))) (define (cross-margins PAIR) (for-each pair-margins (star-obj 'get-cross-sections PAIR))) (define (expand-margins PAIR) (pear-set PAIR) (dj-set (star-obj 'right-element PAIR)) (if (equal? 'Section (cog-type PAIR)) (cross-margins PAIR) (let ((sect (star-obj 'get-section PAIR))) (pair-margins sect) (cross-margins sect)))) (expand-margins pra) (expand-margins fla) (define wrds (wrd-set #f)) (define djs (dj-set #f)) (define pears (pear-set #f)) (format #t "words=~A djs=~A pairs=~A\n" (length wrds) (length djs) (length pears)) (format #t "words: ~A\n" wrds) (format #t "djs:\n~A\n" (prt-dj-list djs)) (format #t "pairs (before):\n~A\n" (prt-element-list pears)) (define sup-obj (add-support-api star-obj)) (define lobs (sup-obj 'total-count-left)) (define robs (sup-obj 'total-count-right)) (define nlobs (inexact->exact (round lobs))) (define nrobs (inexact->exact (round robs))) (format #t "Before: left and right total counts: ~A ~A\n" lobs robs) (define befw 0) (for-each (lambda (WRD) (define cnt (sup-obj 'right-count WRD)) (set! befw (+ befw cnt)) (format #t "before count=~A wrd=~A\n" cnt (cog-name WRD))) wrds) (newline) (define befd 0) (for-each (lambda (DJ) (define cnt (sup-obj 'left-count DJ)) (set! befd (+ befd cnt)) (format #t "before count=~A dj=~A\n" cnt (prt-dj DJ))) djs) (newline) (star-obj 'clobber) (define psu (add-support-compute star-obj)) (for-each (lambda (DJ) (psu 'set-left-marginals DJ)) djs) (for-each (lambda (WRD) (psu 'set-right-marginals WRD)) wrds) (define aftw 0) (for-each (lambda (WRD) (define cnt (sup-obj 'right-count WRD)) (set! aftw (+ aftw cnt)) (format #t "after count=~A wrd=~A\n" cnt (cog-name WRD))) wrds) (newline) (define aftd 0) (for-each (lambda (DJ) (define cnt (sup-obj 'left-count DJ)) (set! aftd (+ aftd cnt)) (format #t "after count=~A dj=~A\n" cnt (prt-dj DJ))) djs) (newline) (format #t "wrdcnt bef: ~A aft: ~A\n" befw aftw) (format #t "djcnt bef: ~A aft: ~A\n" befd aftd) (newline) (format #t "pairs (after):\n~A\n" (prt-element-list pears)) (newline) (define then (current-time)) (recompute-mmt-final star-obj) (format #t "did mmt final ~A secs\n" (- (current-time) then)) (define alobs (sup-obj 'total-count-left)) (define arobs (sup-obj 'total-count-right)) (define anlobs (inexact->exact (round alobs))) (define anrobs (inexact->exact (round arobs))) (format #t "Before: left and right total counts! ~A ~A\n" lobs robs) (format #t "After: left and right total counts! ~A ~A\n" alobs arobs) (newline) (newline) ; (print-matrix-summary-report star-obj) (when (not (equal? anlobs anrobs)) (format #t "Bonk: left and right total counts! ~A ~A\n" alobs arobs) (foobar)) (when (not (equal? anlobs 22942644)) (format #t "Bonk: 22942644 counts changed ! ~A ~A\n" lobs robs) (foobar)) (format #t "=================================\n") ) (define rstars (star-obj 'right-stars wa)) (for-each assn rstars) 63.000 * [+, <+, $- —+ =N+>] to 63.000 * [{+ — “ ” _}, <{+ — “ ” _}, $- {+ — “ ” _}+ =N+>] this is the second one, which was OK last time.... before count=6.0 wrd=+ — “ ” _ before count=21566.0 wrd=+ before count=63.0 dj=<+, $- —+ =N+> before count=0 dj=<{+ — “ ” _}, $- {+ — “ ” _}+ =N+> before count=65.0 dj= +- —+ =N+ before count=0 dj= {+ — “ ” _}- {+ — “ ” _}+ =N+ after count=195.0 wrd=+ — “ ” _ -- up by 189 == 3*63 after count=21440.0 wrd=+ -- down by 126 == 2*63 after count=0.0 dj=<+, $- —+ =N+> -- down by 63 expected after count=63.0 dj=<{+ — “ ” _}, $- {+ — “ ” _}+ =N+> -- up by 63 expected after count=2.0 dj= +- —+ =N+ -- down by 63 after count=63.0 dj= {+ — “ ” _}- {+ — “ ” _}+ =N+ -- up by 63 wrdcnt bef: 21572.0 aft: 21635.0 change of 63 djcnt bef: 128.0 aft: 128.0 what are pairs doing? pairs before: 63.000 * [+, <+, $- —+ =N+>] gives to ({}, {}- {}+ =N+) grows by 63 flat of (+, +- —+ =N+) and rebalancing: [{}, <+, {}- $+ =N+>] grew by 63 [=N, <+, {}- {}+ $+] grew by 63 who shrank? ... should be same. Why is =N not on the list? Because ... xc->sect -> xc's again. so we misssed some words! After that fix .... it remians balanced, but count went ... down to ... 22928994.0 KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK (star-obj 'clobber) (remove-empty-sections star-obj wa #t) (recompute-mmt-final star-obj) removed sections cause basis to change... marginals live on shapes directly.... Crap. Not deleting gives Total observations: 22925974.0 which shrank by 16670 counts (from original 22942644.0) ... but it is balanced! Now remove: (define cls (WordClassNode "+ — “ ” _")) (remove-empty-sections star-obj cls #t) (define WLIST (list (Word "+") (Word "—") (Word "“") (Word "”") (Word "_"))) (for-each (lambda (wrd) (remove-empty-sections star-obj wrd #t)) WLIST) Doing above does not change the totals. (define asc (add-support-compute star-obj)) (asc 'clobber) (asc 'cache-all) After the above, the total counts are the new counts. WTF (rebalance-merge LLOBJ MRG DONOR FRAC) (accumulate-count LLOBJ ACC DONOR FRAC (print-matrix-summary-report star-obj) Now its 22913571.0 .. so this is not even stable under minor perterbations! WTF 2x Clobbering before reshaping gives 22925974.0 which goes back to the original value. So clobbering does not help. Balancing the sections first, then the crosses second gives 22927684.0 which is smaller by 14960.0 Drat. Try again: after each word, rebalance, then do next word.. This gives 22913662.0 which is smaller still -- down by 28982 CMAKE_INSTALL_MESSAGE [e, ] gets merged ... then rebalanced then (j, e- g- h+) gets merged, but the count is wrong cause of the above. Old Solution: all merges first, and flatten later. Alt solution: -- merge sects only; -- do merge crosses at that time. -- record these crosses. -- rebalance that sect. -- subtract crosses that don't come from sects -- merge and rebalance those. Nuke the pairwise stuff. (make-class-node LLOBJ WLIST) 8 - ConnectorMergeTriConInd (Failed) 11 - ClassMergeBasic (Failed) 12 - ClassMergeCons (Failed) 13 - ClassMergeCother (Failed) 2.763 * (f, k- l- {e j}+) duude frak=0.35 merge 4.250 * (f, k- l- {e j}+) into 0.000 * ({e j}, k- l- {e j}+) should be 1.0 because is-non-flat? is #t (LLOBJ 'is-nonflat? CLUST SECT)) ------------ Total observations: 22925974.0 ... same as seen before. (define dj-set (make-atom-set)) (dj-set (LLOBJ 'right-element PAIR))) (LLOBJ 'right-stars WRD) 25.500 * [e, ] ??? counts on member links gram-projective.scm move into ... shape-project.scm ?? LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL The latest: the greatest: 22942954.0 vs orig 22942644.0 so +310 larger! Welllll .... we are ... getting closer ... Guh. Yep. 22942954.0 (define pcnt 0) (define (check-bal CDJ) (set! pcnt (+ 1 pcnt)) (format #t "Done ~A ~A\n" pcnt (prt-dj CDJ)) (define dj-set (make-atom-set)) (define pear-set (make-atom-set)) (define (pair-margins PAIR) (pear-set PAIR) (dj-set (star-obj 'right-element PAIR))) (define (cross-margins PAIR) (for-each pair-margins (star-obj 'get-cross-sections PAIR))) (define (expand-margins PAIR) (pear-set PAIR) (dj-set (star-obj 'right-element PAIR)) (if (equal? 'Section (cog-type PAIR)) (cross-margins PAIR) (let ((sect (star-obj 'get-section PAIR))) (pair-margins sect) (cross-margins sect)))) (for-each (lambda (WRD) (define pra (LLOBJ 'get-pair WRD CDJ)) (when (not (nil? pra)) (expand-margins (LLOBJ 'make-flat CLASS pra)) (expand-margins pra))) WLIST) (define djs (dj-set #f)) (define pears (pear-set #f)) (format #t "words=~A djs=~A pairs=~A\n" (length WLIST) (length djs) (length pears)) (format #t "djs:\n~A\n" (prt-dj-list djs)) (format #t "pairs (before):\n~A\n" (prt-element-list pears)) (define sup-obj (add-support-api star-obj)) (define lobs (sup-obj 'total-count-left)) (define robs (sup-obj 'total-count-right)) (define nlobs (inexact->exact (round lobs))) (define nrobs (inexact->exact (round robs))) (format #t "Before: left and right total counts: ~A ~A\n" lobs robs) (define befw 0) (for-each (lambda (WRD) (define cnt (sup-obj 'right-count WRD)) (set! befw (+ befw cnt)) (format #t "before count=~A wrd=~A\n" cnt (cog-name WRD))) WLIST) (newline) (define befd 0) (for-each (lambda (DJ) (define cnt (sup-obj 'left-count DJ)) (set! befd (+ befd cnt)) (format #t "before count=~A dj=~A\n" cnt (prt-dj DJ))) djs) (newline) (star-obj 'clobber) (define psu (add-support-compute star-obj)) (for-each (lambda (DJ) (psu 'set-left-marginals DJ)) djs) (for-each (lambda (WRD) (psu 'set-right-marginals WRD)) WLIST) (define aftw 0) (for-each (lambda (WRD) (define cnt (sup-obj 'right-count WRD)) (set! aftw (+ aftw cnt)) (format #t "after count=~A wrd=~A\n" cnt (cog-name WRD))) WLIST) (newline) (define aftd 0) (for-each (lambda (DJ) (define cnt (sup-obj 'left-count DJ)) (set! aftd (+ aftd cnt)) (format #t "after count=~A dj=~A\n" cnt (prt-dj DJ))) djs) (newline) (format #t "wrdcnt bef: ~A aft: ~A\n" befw aftw) (format #t "djcnt bef: ~A aft: ~A\n" befd aftd) (newline) (format #t "pairs (after):\n~A\n" (prt-element-list pears)) (newline) ; (recompute-mmt-final star-obj) (psu 'set-left-totals) (psu 'set-right-totals) (define alobs (sup-obj 'total-count-left)) (define arobs (sup-obj 'total-count-right)) (define anlobs (inexact->exact (round alobs))) (define anrobs (inexact->exact (round arobs))) (format #t "Before: left and right total counts: ~A ~A\n" lobs robs) (format #t "After: left and right total counts: ~A ~A\n" alobs arobs) (newline) (newline) ; (print-matrix-summary-report star-obj) (when (not (equal? anlobs anrobs)) (format #t "Bonk: left and right total counts! ~A ~A\n" alobs arobs) (foobar)) (when (not (equal? anlobs 22942644)) (format #t "Bonk: 22942644 counts changed ! ~A ~A\n" lobs robs) (foobar)) (format #t "=================================\n") ) 213 problem after 200 210 at exactly 220 ================================= Done 220 of 8682 R- —+ report at count=220 words=3 djs=6 pairs=6 wrds: ({+ — “ ” _} — R) pairs (before): 0.000 * [R, <{+ — “ ” _}, $- {+ — “ ” _}+>] 0.000 * [{+ — “ ” _}, <{+ — “ ” _}, R- $+>] 22.000 * (—, R- —+) 22.000 * [—, <—, R- $+>] 22.000 * [R, <—, $- —+>] 0.000 * ({+ — “ ” _}, R- {+ — “ ” _}+) at count=220 Before: left and right total counts: 22942644.0 22942644.0 before count=4303.0 wrd=+ — “ ” _ before count=24343.0 wrd=— before count=1895.0 wrd=R before count=0 dj= R- {+ — “ ” _}+ before count=985.0 dj=<—, $- —+> before count=0 dj=<{+ — “ ” _}, R- $+> before count=24.0 dj=<{+ — “ ” _}, $- {+ — “ ” _}+> before count=22.0 dj=<—, R- $+> before count=36.0 dj= R- —+ after count=4359.0 wrd=+ — “ ” _ after count=24338.0 wrd=— after count=1895.0 wrd=R after count=0.0 dj= R- {+ — “ ” _}+ after count=985.0 dj=<—, $- —+> after count=0.0 dj=<{+ — “ ” _}, R- $+> after count=24.0 dj=<{+ — “ ” _}, $- {+ — “ ” _}+> after count=22.0 dj=<—, R- $+> after count=36.0 dj= R- —+ at count=220 wrdcnt bef: 30541.0 aft: 30592.0 djcnt bef: 1067.0 aft: 1067.0 pairs (after): 0.000 * [R, <{+ — “ ” _}, $- {+ — “ ” _}+>] 0.000 * [{+ — “ ” _}, <{+ — “ ” _}, R- $+>] 22.000 * (—, R- —+) 22.000 * [—, <—, R- $+>] 22.000 * [R, <—, $- —+>] 0.000 * ({+ — “ ” _}, R- {+ — “ ” _}+) at count=220 Before: left and right total counts: 22942644.0 22942644.0 After: left and right total counts: 22942644.0 22942695.0 MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMmmm report at loop=8682 words=3084 djs=24830 pairs=59650 Nope. Its later. Main merge: balanced. After: left and right total counts: 22942644.0 22942644.0 Remain: Remaining 29586 cross postmmt post final total counts: 22738908.0 22827276.0 postclean: left and right total counts: 22942954.0 22942954.0 Ahhh. So consider: 21: (g, a+ b+ c+) 13: (g, a+ d+ c+) we have 21: [b, ] 13: [d, ] and say b,d are mergable. we also have 21: [a, ] 21: [c, ] 13: [a, ] 13: [c, ] which don't ever show but should merge fine. total djs=38983 sected djs=17618 left-over=29586 At loop=10000 remiandiner check: total counts: 22859176.0 22897154.0 At loop=1000 remiandiner check: total counts: 22934881.0 22938517.0 At loop=100 remiandiner check: total counts: 22941642.0 22942079.0 At loop=2 remiandiner check: total counts: 22942636.0 22942642.0 should be 22942644.0 so left is under by 8 right is under by 2 But the merge went well, so wtf !? left is 4 crosses of 2.0 each .. right is one cross that's preflat(??) recomp -- know the . ” {+ — “ ” _} (list (WordNode "know") (WordNode "the") (WordNode ".") (WordNode "”")) (define sup-obj (add-support-api star-obj)) (define psu (add-support-compute star-obj)) (sup-obj 'right-count (WordNode "know")) ; 50313.0 (psu 'set-right-marginals (WordNode "know")) (sup-obj 'right-count (WordNode "know")) ; 50313.0 (sup-obj 'right-count (WordNode "the")) ; 1263905.0 (psu 'set-right-marginals (WordNode "the")) (sup-obj 'right-count (WordNode "the")) ; 1263905.0 (sup-obj 'total-count-right) 22942642.0 (psu 'set-right-totals) (sup-obj 'right-count (WordNode ".")) ; 722575.0 (psu 'set-right-marginals (WordNode ".")) (sup-obj 'right-count (WordNode ".")) ; 722575.0 still no change... (sup-obj 'right-count (WordNode "”")) ; 197793.0 (psu 'set-right-marginals (WordNode "”")) (sup-obj 'right-count (WordNode "”")) ; 197793.0 (star-obj 'clobber) (psu 'set-right-marginals (WordNode "know")) (sup-obj 'right-count (WordNode "truth")) ; 4582.0 (psu 'set-right-marginals (WordNode "truth")) (sup-obj 'right-count (WordNode "truth")) ; no change (define ae (star-obj 'get-all-elts)) (length ae) ; 2805101 (fold (lambda (X SUM) (+ SUM (get-count X))) 0 ae) ; 22942644.0 So ... that's correct. what's missing? (define (checkw WRD) (define oc (sup-obj 'right-count WRD)) (psu 'set-right-marginals WRD) (define nc (sup-obj 'right-count WRD)) (when (not (equal? oc nc)) (format #t "fail at ~A for old= ~A new= ~A\n" (cog-name WRD) oc nc) (foobar))) (for-each checkw (star-obj 'left-basis)) fail at + — “ ” _ for old= 182354.0 new= 182356.0 (psu 'set-right-totals) (sup-obj 'total-count-right) ; 22942644.0 FIXED! WTF .... how is this possible? (recompute-mmt LLOBJ (cons CLASS WLIST)) (WordClassNode "+ — “ ” _") (define WRD-LIST (cons (WordClassNode "+ — “ ” _") (list (WordNode "+") (WordNode "—") (WordNode "“") (WordNode "”") (WordNode "_")))) (length (wrd-set #f)) ; 5354 (length (dj-set #f)) ; 84517 (define xx (WordClassNode "+ — “ ” _")) (any (lambda (WRD) (equal? xx WRD)) (wrd-set #f)) (define bad #f) (define (checkd DJ) (define oc (sup-obj 'left-count DJ)) (psu 'set-left-marginals DJ) (define nc (sup-obj 'left-count DJ)) (when (not (equal? oc nc)) (format #t "fail old= ~A new= ~A for ~A\n" oc nc (prt-dj DJ)) (set! bad DJ) (foobar))) (define djl (dj-set #f)) (for-each checkd (dj-set #f)) (for-each checkd (star-obj 'right-basis)) fail old= 0 new= 2.0 for know- the- .+ {+ — “ ” _}+ (sup-obj 'total-count-left) ; 22942636.0 (psu 'set-left-totals) (sup-obj 'total-count-left) ; 22942640.0 so its coming back.. At loop=25000 remmt remainder check time: 1001 secs remiandiner check: total counts: 22942644.0 22942644.0 So this looks goood .... redone was 25437 of left-vers 29586 ------ merge-majority: Remaining 29586 cross in 48253 secs remmt reamindr time: 913 secs postmmt post final total counts: 22942079.0 22940642.0 wtf... missing clobber. In-group size=5: `+` `—` `“` `”` `_` ------ merge-majority: Merge 8682 sections in 9 secs ------ merge-majority: Remaining 29586 cross in 586 secs ------ merge-majority: Cleanup `+ — “ ” _` in 60 secs ------ Merged into `+ — “ ” _` in 701 secs ------ Recomputed MMT marginals in 851 secs In-group size=5: `+` `—` `“` `”` `_` ------ merge-majority: Merge 8682 sections in 8 secs ------ merge-majority: Remaining 29586 cross in 1138 secs ------ merge-majority: Cleanup `+ — “ ” _` in 58 secs ------ Merged into `+ — “ ” _` in 1252 secs Initial in-group size=2: `,` `;` In-group size=2 overlap = 3705 of 48451 disjuncts, commonality= 7.65% In-group size=2: `,` `;` ------ merge-majority: Merge 40041 sections in 37 secs ------ merge-majority: Remaining 108083 cross in 8599 secs OUCHHHHH ------ merge-majority: Cleanup `, ;` in 160 secs ------ Merged into `, ;` in 8820 secs QueueValue LinkStreamValue (make-data-logger ATOM KEY) (define smi (add-symmetric-mi-compute LLOBJ)) (define mmt-q (smi 'mmt-q)) Start merge 1 In-group size=5: `+` `—` `“` `”` `_` Remaining 29586 cross in 600 secs Start merge 2 In-group size=2: `,` `;` -- Remaining 108083 cross in 8650 secs Remaining 108083 cross in 159 secs Start merge 3 In-group size=2: `was` `is` -- Remaining 35109 cross in 979 secs Remaining 35109 cross in 57 secs Start merge 4 In-group size=4: `but` `and` `that` `as` -- Remaining 63812 cross in 3081 secs ------ merge-majority: Remaining 63815 cross in 113 secs Hmmm .... counts differ. Start merge 5 In-group size=5: `?` `.` `###LEFT-WALL###` `+ — “ ” _` `!` Remaining 161210 cross in 16100 secs OUCH ------ merge-majority: Remaining 161244 cross in 239 secs Hmm .... Start merge 6 In-group size=2: `?` `!` -- Remaining 2161 cross in 6 secs Start merge 7 In-group size=4: `He` `It` `I` `There` -- Remaining 27561 cross in 583 secs Start merge 8 In-group size=4: `"` `”` `,` `what` -- Remaining 45696 cross in 1408 secs Start merge 9 In-group size=2: `He It I There` `She` -- Remaining 19406 cross in 320 secs Start merge 10 In-group size=4: `He` `She` `They` `We` -- Remaining 2552 cross in 9 secs Start merge 11 In-group size=4: `A` `No` `He It I There She` `The` -- Remaining 25273 cross in 486 secs vs. Remaining 25273 cross in 48 secs Start merge 12 In-group size=4: `of` `in` `to` `from` -- Remaining 106362 cross in 8923 secs Start merge 13 In-group size=3: `He It I There` `to` `in` -- Remaining 32008 cross in 693 secs oops! (define log-anchor (AnchorNode "data logger")) (fetch-atom (AnchorNode "data logger")) -l cogserver-gram.scm (use-modules (opencog)) (use-modules (opencog learn)) (define dl (make-data-logger (Concept "foo")(Predicate "bar"))) (dl (Concept "asdf")) (cog-value (Concept "foo")(Predicate "bar")) (define (zap PRED) (define v (cog-value (AnchorNode "data logger") PRED)) (define vl (cog-value->list v)) (define t (cog-type v)) (define l (length vl)) (define svl (take vl (- l 1))) (cog-set-value! (AnchorNode "data logger") PRED (cog-new-value t svl))) (zap (Predicate "mmt-q")) (zap (Predicate "ranked-mi")) (zap (Predicate "sparsity")) (zap (Predicate "entropy")) (zap (Predicate "left dim")) (zap (Predicate "right dim")) (zap (Predicate "left-count")) (zap (Predicate "right-count")) (zap (Predicate "total entries")) ------ merge-majority: Remaining 29586 cross in 569 secs ------ merge-majority: Remaining 29586 cross in 43 secs Whoa! (define wli (list (Word "to") (Word "in") (WordClass "He It I There") (WordClassNode "He It I There to in"))) (recompute-mmt star-obj wli) 2.000 * [{He It I There}, ] does not have a matching section! Which means the balance is wrong! Thats .... bad! ((add-support-compute star-obj) 'cache-all) 19009 =================================================== (length (star-obj 'right-stars (Word "As"))) ; 237 (length (star-obj 'right-stars (Word "as"))) ; 2491 (length (star-obj 'right-stars (WordClass "As as"))) ; 2 (length (star-obj 'right-stars (WordClass "As as.i"))) ; 0 (length (star-obj 'right-stars (WordClass "As as.i.i"))) ; 0 ------ merge-majority: Merge 928 sections in 0 secs ------ merge-majority: Remaining 1774 cross in 1 secs (define smi (add-symmetric-mi-compute star-obj)) (define fmi (smi 'mmt-fmi (Word "As") (Word "as"))) ; 14.463903917003817 (smi 'mmt-joint-prob (Word "As") (Word "as")) ; 6.589255553333501e-7 (smi 'mmt-total) ; 869287881406.0 product of above: 572796.0 (define wclass (WordClass "foo")) (define in-grp (list (Word "As") (Word "as"))) (define merge-majority (make-merge-majority star-obj 0.5 4 #t)) (merge-majority wclass in-grp) (length (star-obj 'right-stars (WordClass "foo"))) ; 0 wlen ; 2 vote-thesh ; 2 (define voter-list wlist) (define (vote-to-accept? DJ) (<= vote-thresh (fold (lambda (WRD CNT) (if (nil? (star-obj 'get-pair WRD DJ)) CNT (+ 1 CNT))) 0 voter-list))) (define dj-set (make-atom-set)) (for-each (lambda (WRD) (for-each (lambda (PAIR) (dj-set (star-obj 'right-element PAIR))) (star-obj 'right-stars WRD))) WLIST) (define dj-list (dj-set #f)) (length dj-list) ; 2726 BTW, (length (star-obj 'right-stars (Word "As"))) ; 237 (length (star-obj 'right-stars (Word "as"))) ; 2491 (+ 237 2491) ; 2728 so only two is in common 1) what are those two? see below 2) why are they still there? 3) how can they give rise to this huge MI? (define all-stars (append (star-obj 'right-stars (Word "As")) (star-obj 'right-stars (Word "as")))) (define all-djs (map (lambda (PR) (star-obj 'right-element PR)) all-stars)) (define common-djs (keep-duplicate-atoms all-djs)) ; As early as ... As soon as -- both shapes (length common-djs) ; 2 (star-obj 'get-pair (Word "As") (list-ref common-djs 0)) ; OK - 8 (star-obj 'get-pair (Word "as") (list-ref common-djs 0)) ; OK - 19 (star-obj 'get-pair (Word "As") (list-ref common-djs 1)) ; OK - 599 (star-obj 'get-pair (Word "as") (list-ref common-djs 1)) ; OK - 956 dot product is (+ (* 8 19) (* 599 956)) ; 572796 which agrees with (smi 'mmt-joint-prob from above... (smi 'mmt-marginal (Word "As")) ; 2.5444505178451385e-6 (smi 'mmt-marginal (Word "as")) ; 1.1459704216614244e-5 (* (smi 'mmt-total) (smi 'mmt-marginal (Word "As"))) ; 2211860.0 (* (smi 'mmt-total) (smi 'mmt-marginal (Word "as"))) ; 9961782.0 (log2 (/ (smi 'mmt-joint-prob (Word "As") (Word "as")) (* (smi 'mmt-marginal (Word "As")) (smi 'mmt-marginal (Word "as"))))) ; 14.463903917003817 so oll korrect (length (done-djs #f)) ; 2132 (length left-overs) ; 1774 (star-obj 'get-section (star-obj 'get-pair (Word "As") (list-ref common-djs 0))) ; Are the commons in the left-overs? they should be (define done-djli (done-djs #f)) (atoms-subtract common-djs done-djli) ; two of them (atoms-subtract common-djs left-overs) ; none .. good (define sect-done? (make-once-predicate)) (merge-shapes left-overs) (define (merge-dj DJ) (define have-majority (vote-to-accept? DJ)) (if have-majority (format #t "accept DJ= ~A\n" (prt-dj DJ)))) Non accepted ... why? (vote-to-accept? (list-ref common-djs 0)) ; #t So .... some other DJ clobbers them! (define common-cross (list (star-obj 'get-pair (Word "As") (list-ref common-djs 0)) (star-obj 'get-pair (Word "as") (list-ref common-djs 0)) (star-obj 'get-pair (Word "As") (list-ref common-djs 1)) (star-obj 'get-pair (Word "as") (list-ref common-djs 1)) )) (define common-sect (map (lambda (XES) (LLOBJ 'get-section XES)) common-cross)) (for-each (lambda (SEC) (format #t "Its ~A\n" (prt-element SEC))) common-sect) (define (set-shape-done! SHP) (for-each (lambda (WRD) (define XSECT (LLOBJ 'get-pair WRD SHP)) (when (not (nil? XSECT)) (let* ((sect (LLOBJ 'get-section XSECT)) (done (sect-done? sect))) (if (any (lambda (s) (equal? sect s)) common-sect) (format #t "cross=~A gives ~A\n" (prt-element XSECT) (prt-element sect)))))) WLIST)) (define sect-done? (make-once-predicate)) (merge-shapes left-overs) cross= 8.000 * [as, ] gives 8.000 * (early, As- as+) cross=956.000 * [as, ] gives 956.000 * (soon, as- as+) cross=19.000 * [as, ] gives 19.000 * (early, as- as+) The above were not merged.... but they clobber the mergables (for-each (lambda (xes) (format #t "commx ~A\n" (prt-element xes))) common-cross) commx 8.000 * [As, ] commx 19.000 * [as, ] commx 599.000 * [As, ] commx 956.000 * [as, ] So ... for some sect, there is some xes that is mergable, and another xes that is not mergable. (merge-dj DJ) -- accept-section? (define wclass (WordClass "foo")) (define in-grp (list (Word "As") (Word "as"))) (define merge-majority (make-merge-majority star-obj 0.5 4 #t)) (merge-majority wclass in-grp) (length (star-obj 'right-stars (WordClass "foo"))) ; 0 WTF... still not fixed!? (define all-stars (append (star-obj 'right-stars (Word "As")) (star-obj 'right-stars (Word "as")))) (define all-djs (map (lambda (PR) (star-obj 'right-element PR)) all-stars)) (define common-djs (keep-duplicate-atoms all-djs)) (length common-djs) ; 2 (accept-shape? (list-ref common-djs 0)) ; #t (accept-shape? (list-ref common-djs 1)) ; #t (define (merge-dj DJ) (define have-majority (accept-conseq? DJ)) (if have-majority (format #t "Yes merge ~A\n" (prt-dj DJ)))) (define (rebalance-dj DJ) #f) (define (merge-one-shape SHP) (define have-majority (accept-shape? SHP)) (if have-majority (format #t "Yes merge shape ~A\n" (prt-dj SHP)))) (define (shape-done? SHP) (define prt #f) (when (any (lambda (S) (equal? S SHP)) common-djs) (set! prt #t) (format #t "duude check on ~A\n" (prt-dj SHP))) (any (lambda (WRD) (define XSECT (LLOBJ 'get-pair WRD SHP)) (if (nil? XSECT) #f (begin (when prt (format #t "duude sect=~A\n" (prt-element (LLOBJ 'get-section XSECT))) ) (sect-done? (LLOBJ 'get-section XSECT)))) ) WLIST)) (define (shape-done? SHP) (if (accept-shape? SHP) (format #t "duude accept ~A\n" (prt-dj SHP))) (any (lambda (WRD) (define XSECT (LLOBJ 'get-pair WRD SHP)) (if (nil? XSECT) #f (sect-done? (LLOBJ 'get-section XSECT)))) WLIST)) (define alt0 (ShapeLink (WordNode "early") (Connector (WordNode "as") (ConnectorDir "-")) (Connector (VariableNode "$connector-word") (ConnectorDir "+")))) (define alt1 (ShapeLink (WordNode "early") (Connector (WordNode "As") (ConnectorDir "-")) (Connector (VariableNode "$connector-word") (ConnectorDir "+")))) (accept-shape? alt0) #f ... why? (LLOBJ 'get-pair (Word "as") alt0) ; 19 (LLOBJ 'get-pair (Word "As") alt0) ; '() (LLOBJ 'get-pair (Word "as") alt1) ; 8 (LLOBJ 'get-pair (Word "As") alt1) ; '() (get-all-conseqs alt0 voter-list) The acceptrs are mis-designed! XXX get-all-conseqs (define (get-x WRD) (define XROS (LLOBJ 'get-pair WRD SHP)) (if (nil? XROS) #f (let* ((SECT (LLOBJ 'get-section XROS)) (all-cros (LLOBJ 'get-cross-sections SECT))) (define XROS (LLOBJ 'get-pair (Word "as") alt0)) OK, so alt0 comes in, vote is rejected, -- gets rebalanced, wiping out the mergable counts, (maybe??) -- gets logged as "done already -- the later mergable never happens, cause its "done" alt0 comes in, not mergable. alt1 comes in, not mergable. But: (define XROS (LLOBJ 'get-pair (Word "as") alt1)) (define SECT (LLOBJ 'get-section XROS)) (define all-cross (LLOBJ 'get-cross-sections SECT)) (define shape (LLOBJ 'right-element (car all-cross))) shape is in left-overs, and (vote-to-accept? shape) ; #t So: .. for each shape, find others in list, vote them, then mark as done. (concatenate (map (lambda (WRD) (define XROS (LLOBJ 'get-pair WRD SHP)) (if (nil? XROS) '() (let* ((SECT (LLOBJ 'get-section XROS)) (ALL-X (LLOBJ 'get-cross-sections SECT))) (map (lambda (CRS) (LLOBJ 'right-element CRS)) ALL-X)))) WLIST)) ----- (define (get-alt-shapes SHP VALID-SHAPES) (define alt-shp (make-atom-set)) (for-each (lambda (WRD) (define XROS (LLOBJ 'get-pair WRD SHP)) (if (not (nil? XROS)) (let* ((SECT (LLOBJ 'get-section XROS)) (ALL-X (LLOBJ 'get-cross-sections SECT))) (for-each (lambda (CRS) (alt-shp (LLOBJ 'right-element CRS))) ALL-X)))) WLIST) ; Is the alt-shape in the set? (lset-intersection equal? VALID-SHAPES (alt-shp #f))) rebalance-dj Yay! Fixed! commx 8.000 * [As, ] commx 19.000 * [as, ] [a, ] [b, ] ==> N [a, ] + 0 [b, ] ==> M [a, ] + 0 [b, ] (f, a- a+) (f, b- a+) ==================================================== link-class.scm -- attic; old unimplemented idea. ==================================================== Next: unbalanced after two. How? r10-singles.rdb In-group size=2: `,` `;` ------ merge-majority: Merge 40028 sections in 36 secs ------ merge-majority: Remaining 108083 cross in 159 secs ------ merge-majority: Cleanup `, ;` in 154 secs ------ Merged into `, ;` in 401 secs ------ Recomputed MMT marginals in 1816 secs Error: left and right total pairs not equal! 2758433.0 2723864.0 Error: left and right total counts not equal! 23114818.0 22942644.0 left tot is (sup-obj 'total-count-left) It is sum_x N(x,*) (define asc (add-support-compute star-obj)) (asc 'total-count-left) ; 23114818.0 -- no change N(x,*) is (asc 'right-count x) (define sup (add-support-api star-obj)) (define sumo 0) (for-each (lambda (WRD) (define rc (sup 'right-count WRD)) (define rn (asc 'right-count WRD)) (define erc (inexact->exact rc)) (define ern (inexact->exact rn)) (set! sumo (+ sumo erc)) (when (not (equal? erc ern)) (format #t "mismatch at ~A pre=~D new=~D\n" (cog-name WRD) erc ern) (foobar))) (star-obj 'left-basis)) sumo $4 = 22942644 wtf... (define (sum-right-count ITEM) (sum-count (star-obj 'right-stars ITEM))) =================== total-coutn left documented as sum_x N(x,*) where N(x,*) is (asc 'right-count x) and (define (compute-total-count-from-left) (fold ;;; (lambda (item sum) (+ sum (sum-left-count item))) (lambda (item sum) (+ sum (api-obj 'left-count item))) 0 (star-obj 'right-basis))) so it is implemented opposite of documentation (define sup (add-support-api star-obj)) (define asc (add-support-compute star-obj)) (define sumo 0) (define sumn 0) (define fail 0) (for-each (lambda (DJ) (define rc (sup 'left-count DJ)) (define erc (inexact->exact rc)) (define rn (asc 'left-count DJ)) (define ern (inexact->exact rn)) (set! sumo (+ sumo erc)) (set! sumn (+ sumn ern)) (when ; (and (not (equal? erc ern)) (not (equal? 0 ern))) ; (equal? 0 erc) ; (and (not (equal? erc ern)) (not (equal? 'ShapeLink (cog-type DJ)))) (not (equal? erc ern)) (format #t "not equal at ~A pre=~D new=~D\n" (prt-dj DJ) erc ern) ; (format #t "zero at ~A pre=~D \n" ; (prt-dj DJ) erc) (set! fail (+ fail 1)))) (foobar) (star-obj 'right-basis)) insta-fail: mismatch at pre=2 new=0 So these are all ... zero and should have been deleted! fail ; 29323 remove-empty-sections (define sup (add-support-api LLOBJ)) (define (remove-all-empties LLOBJ WRD-LIST) Crapp. Still not working right. Even after one: 23114818 so first merge is already bad. apparently, its only shapes ... [something, ] (work, _- something+) [_, ] (define sup (add-support-api star-obj)) (define dc 0) (for-each (lambda (DJ) (define rc (sup 'left-count DJ)) (define erc (inexact->exact rc)) (when (equal? 0 erc) (set! dc (+ dc 1)) (cog-delete-recursive! DJ))) (star-obj 'right-basis)) (define sup (add-support-api star-obj)) (define asc (add-support-compute star-obj)) (define sumo 0) (define sumn 0) (define fail 0) (define newzero 0) (define nonz 0) (for-each (lambda (DJ) (define rc (sup 'left-count DJ)) (define erc (inexact->exact rc)) (define rn (asc 'left-count DJ)) (define ern (inexact->exact rn)) (set! sumo (+ sumo erc)) (set! sumn (+ sumn ern)) (when (equal? 0 ern) (set! newzero (+ 1 newzero)) (cog-delete-recursive! DJ)) (when (not (equal? erc ern)) (when (not (equal? 0 ern)) (set! nonz (+ 1 nonz)) (store-atom (asc 'set-left-marginals DJ)) (let ((newc (inexact->exact (sup 'left-count DJ)))) (when (not (equal? newc ern)) (format #t "meta fail ~A ~A ~A for ~A\n" erc ern newc (prt-dj DJ)) (foobar))) ) (set! fail (+ fail 1))) ) (star-obj 'right-basis)) no failures... So .. first pass left RAM in a good state, but left DB in a bad state... remove-all-empty-sections Try again: do merge, check balances load-all-from-storage, check-balances. (define shapes (cog-get-atoms 'ShapeLink)) (define shp (car shapes)) (define miss 0) (define bad #f) (for-each (lambda (shp) (when (not (equal? 1 (length (cog-keys shp)))) ; (format #t "oops ~A has ~A\n" (prt-dj shp) (cog-keys shp)) (set! bad shp) (if (< 1 (cog-incoming-size shp)) (foobar)) (set! miss (+ 1 miss)) )) shapes) miss ;5850 (ShapeLink . 844157) (load-atoms-of-type 'ShapeLink) ((CrossSection (ctv 1 0 2) (WordNode "###LEFT-WALL###") (define badm (ShapeLink (WordNode "-") (Connector (VariableNode "$connector-word") (ConnectorDir "-")) (Connector (WordNode "me") (ConnectorDir "-")) (Connector (WordNode "”") (ConnectorDir "+")) (Connector (WordNode "“") (ConnectorDir "+")))) fresh: (ShapeLink . 838580) none are empty (define chk (cog-link 'ShapeLink (WordNode "-") (Connector (VariableNode "$connector-word") (ConnectorDir "-")) (Connector (WordNode "me") (ConnectorDir "-")) (Connector (WordNode "”") (ConnectorDir "+")) (Connector (WordNode "“") (ConnectorDir "+")))) The keys got wiped out. How? (define (have-key? atm) (define nk (PredicateNode "*-Norm Key cover-section")) (define ov (cog-value atm nk)) (cog-set-value! atm nk #f) (fetch-atom atm) (when (not (equal? 1 (length (cog-keys atm)))) (format #t "no keys in storage!\n") (foobar)) (define nv (cog-value atm nk)) (when (not (equal? nv ov)) (format #t "value changed: ~A vs ~A\n" ov nv) (foobar))) OK, cross-section get zeroed ... (cog-set-value! atm nk ov) (cog-value sbad (PredicateNode "*-Norm Key cover-section")) (cog-set-value! sbad (PredicateNode "*-Norm Key cover-section") (FloatValue 0 0 0 0)) (define b (make-sbad)) (define c (CrossSection (WordNode "###LEFT-WALL###") b)) (define s (star-obj 'get-section c)) no keys in storage! msg=after orpha erasure (have-keys? c) (star-obj 'clobber) (define WRD-LIST (cons (WordClassNode "+ — “ ” _") (list (WordNode "+") (WordNode "—") (WordNode "“") (WordNode "”") (WordNode "_")))) (remove-all-empty-sections star-obj WRD-LIST) (define (del-sect SEC) ; Cleanup cross sections, if they are provided. (for-each (lambda (xst) (format #t "duuude looper ~A\n" (prt-element xst)) (when (and (cog-atom? xst) (is-zero? (LLOBJ 'get-count xst))) (format #t "duuude delete ~A\n" (prt-element xst)) (let ((shp (LLOBJ 'right-element xst))) (cog-extract! xst) (format #t "duuude extract ~A\n" (prt-dj shp)) (cog-extract! shp) ;; Safe, its not recursive. ))) (LLOBJ 'get-cross-sections SEC)) (when (is-zero? (LLOBJ 'get-count SEC)) (format #t "duuude delete sect ~A\n" (prt-element SEC)) (let ((csq (LLOBJ 'right-element SEC))) (cog-delete! SEC) (cog-delete! csq) ;; Safe; because its not recursive. ))) (for-each (lambda (ITEM) (when (is-zero? (cog-count ITEM)) (format #t "fail ~A\n" ITEM) (foobar))) (LLOBJ 'right-stars (WordNode "”"))) (define bad (ShapeLink (WordNode "J") (Connector (WordNode "_") (ConnectorDir "-")) (Connector (VariableNode "$connector-word") (ConnectorDir "+")))) (define c (car (cog-incoming-set bad))) (define s (star-obj 'get-section c)) (define s (Section (WordNode "J") (ConnectorSeq (Connector (WordNode "_") (ConnectorDir "-")) (Connector (WordNode "_") (ConnectorDir "+"))))) (define bad2 (ShapeLink (WordNode "Hm") (Connector (WordNode "###LEFT-WALL###") (ConnectorDir "-")) (Connector (WordNode "“") (ConnectorDir "-")) (Connector (WordNode ".") (ConnectorDir "+")) (Connector (VariableNode "$connector-word") (ConnectorDir "+")))) (define bad-two (ShapeLink (WordNode "added") (Connector (WordNode "”") (ConnectorDir "-")) (Connector (WordNode "she") (ConnectorDir "-")) (Connector (VariableNode "$connector-word") (ConnectorDir "+")) (Connector (WordNode "“") (ConnectorDir "+"))) (define b (make-sbad)) (cog-incoming-set b) (define WRD-LIST (cons (WordClassNode "+ — “ ” _") (list (WordNode "+") (WordNode "—") (WordNode "“") (WordNode "”") (WordNode "_")))) (define rs (star-obj 'right-stars (WordNode "“"))) (length rs) ; 3849 (define (is-zero? cnt) (< cnt 1.0e-10)) (define (del-sect SEC) ; Cleanup cross sections, if they are provided. (for-each (lambda (xst) (format #t "duuude looper ~A\n" (prt-element xst)) (when (and (cog-atom? xst) (is-zero? (LLOBJ 'get-count xst))) (format #t "duuude delete ~A\n" (prt-element xst)) (let ((shp (LLOBJ 'right-element xst))) ;(cog-extract! xst) (format #t "duuude extract ~A\n" (prt-dj shp)) ;(cog-extract! shp) ;; Safe, its not recursive. ))) (LLOBJ 'get-cross-sections SEC)) (when (is-zero? (LLOBJ 'get-count SEC)) (format #t "duuude delete sect ~A\n" (prt-element SEC)) ;(let ((csq (LLOBJ 'right-element SEC))) ;(cog-delete! SEC) ;(cog-delete! csq) ;; Safe; because its not recursive. ;) )) (define (del-sect SEC) (for-each (lambda (xst) (when (and (cog-atom? xst) (is-zero? (LLOBJ 'get-count xst))) (let ((shp (LLOBJ 'right-element xst))) (if (equal? shp b) (format #t "duuude extract ~A\n" (prt-dj shp))) ))) (LLOBJ 'get-cross-sections SEC)) (when (is-zero? (LLOBJ 'get-count SEC)) (format #t "duuude delete sect ~A\n" (prt-element SEC)) )) (define (del-xes XST) (define sct (LLOBJ 'get-section XST)) (when (and (cog-atom? sct) (is-zero? (LLOBJ 'get-count sct))) (del-sect sct)) (when (and (cog-atom? XST) (is-zero? (LLOBJ 'get-count XST))) (let ((shp (LLOBJ 'right-element XST))) (if (equal? shp b) (format #t "FOunt it!!\n")) ))) (define c (car (cog-incoming-set b))) (define s (star-obj 'get-section c)) (any (lambda (SECT) (equal? SECT c)) rs) ; #f (any (lambda (SECT) (equal? SECT s)) rs) ; #f (define xes (star-obj 'get-cross-sections s)) .... apparently, the needed cross sections have been deleted earlier. And thus never matrialize. So the remove of marginals can't be done this way. (EvaluationLink (PredicateNode "*-Direct Sum Wild (gram-class⊕cross-section)") (WordNode "“") (AnyNode "right-wild-direct-sum")) (define sup (add-support-api star-obj)) (define asc (add-support-compute star-obj)) (define sumo 0) (define sumn 0) (define fail 0) (for-each (lambda (DJ) (define rc (sup 'left-count DJ)) (define erc (inexact->exact rc)) (define rn (asc 'left-count DJ)) (define ern (inexact->exact rn)) (set! sumo (+ sumo erc)) (set! sumn (+ sumn ern)) (when ; (and (not (equal? erc ern)) (not (equal? 0 ern))) ; (equal? 0 erc) ; (and (not (equal? erc ern)) (not (equal? 'ShapeLink (cog-type DJ)))) (not (equal? erc ern)) ; (format #t "not equal at ~A pre=~D new=~D\n" ; (prt-dj DJ) erc ern) ; (foobar) ; (format #t "zero at ~A pre=~D \n" ; (prt-dj DJ) erc) (set! fail (+ fail 1)))) (star-obj 'right-basis)) not equal at pre=0 new=3 fail ; 12596 wow. Last time, it was 29K .... (define sumo 0) (define sumn 0) (define noteq 0) (define zold 0) (define znew 0) (define nzo 0) (define ns 0) (for-each (lambda (DJ) (define rc (sup 'left-count DJ)) (define erc (inexact->exact rc)) (define rn (asc 'left-count DJ)) (define ern (inexact->exact rn)) (set! sumo (+ sumo erc)) (set! sumn (+ sumn ern)) (if (not (equal? erc ern)) (set! noteq (+ noteq 1))) (if (equal? 0 erc) (set! zold (+ 1 zold))) (if (equal? 0 ern) (set! znew (+ 1 znew))) (if (and (not (equal? erc ern)) (not (equal? 0 erc))) (set! nzo (+ 1 nzo))) (if (and (not (equal? erc ern)) (not (equal? 'ShapeLink (cog-type DJ)))) (set! ns (+ 1 ns))) ) (star-obj 'right-basis)) noteq ; 12596 zold ; 12596 znew ; 0 nzo ; 0 ns ; 0 (define cset-obj (make-pseudo-cset-api)) (define covr-obj (add-covering-sections cset-obj)) (covr-obj 'fetch-pairs) Elapsed time to load cross-sections: 105 seconds (ShapeLink . 825711) (CrossSection . 17758) ????? (define sup (add-support-api star-obj)) (define nk 0) (define zold 0) (for-each (lambda (SH) (define rc (sup 'left-count SH)) (define erc (inexact->exact rc)) (if (equal? 0 erc) (set! zold (+ 1 zold))) (when (not (equal? 1 (length (cog-keys SH)))) (set! nk (+ 1 nk)))) (cog-get-atoms 'ShapeLink)) (define badl (filter (lambda (SH) (not (equal? 1 (length (cog-keys SH))))) (cog-get-atoms 'ShapeLink))) nk ;0 (length (cog-get-atoms 'ShapeLink)) ; 825711 zold ; 0 so all shpes got loaded, an they all have marginals on them Some CrossSections are stored!! WTF!? (for-each cog-delete! (cog-get-atoms 'CrossSection)) (covr-obj 'explode-sections) ; (ShapeLink . 838307) (- 838307 825711) ; 12596 same as the missing ones. (define bad (car badl)) (define c (car (cog-incoming-set bad))) (define s (star-obj 'get-section c)) (define cs (star-obj 'get-cross-sections s)) They're all there. How did they not get stored? -------- Fresh: (ShapeLink . 838580) all have keys and non-zero margins There are no saved crosses. after blowup: (ShapeLink . 838580) so nothing new added. agglo-rank.scm: gram-majority.scm: shape-project.scm: shape-project.scm is causing trouble. ------------- Fixed !? again? (ShapeLink . 810773) So that's less than before nk ; 0 zold ; 0 excellent! No more stores Crosses, too crap noteq ; 27534 zold ; 27534 znew ; 0 nzo ; 0 ns ; 0 Refix again, Now its (ShapeLink . 838307) OK, good same as before No change in the bad stats (ShapeLink . 810773) (define oshp (cog-get-atoms 'ShapeLink)) (define nshp (cog-get-atoms 'ShapeLink)) (define no (atoms-subtract nshp oshp)) (for-each (lambda (dj) (for-each (lambda (xs) (format #t "yo ~A\n" (prt-element xs))) (cog-incoming-set dj))) (take no 20)) (for-each (lambda (dj) (for-each (lambda (xs) (define sc (star-obj 'get-section xs)) (when (not (equal? (star-obj 'get-count sc) (star-obj 'get-count xs))) (format #t "wtf ~A\n ~A\n" (prt-element xs) (prt-element sc)) (FOOBAR) )) (cog-incoming-set dj))) no) (cog-close (RocksStorageNode "rocks:///home/ubuntu/data//r10-singles.rdb")) (cog-open (RocksStorageNode "rocks:///home/ubuntu/data//r10-bad-shapes.rdb")) (define bsh (Anchor "bad shape")) (define bxs (Anchor "bad cross")) (for-each (lambda (dj) (store-atom (Member dj bsh)) (for-each (lambda (xs) (store-atom (Member xs bxs))) (cog-incoming-set dj))) no) duuude size of affected=120354 duuude size of orphasn=40751 Before deletion of empties: (ShapeLink . 875448) (define sup (add-support-api star-obj)) (define nk 0) (define zer 0) (for-each (lambda (SH) (define rc (sup 'left-count SH)) (define erc (inexact->exact rc)) (if (equal? 0 erc) (set! zer (+ 1 zer))) (when (not (equal? 1 (length (cog-keys SH)))) (set! nk (+ 1 nk)))) (cog-get-atoms 'ShapeLink)) zer ; 37141 so (- 875448 37141) is 838307 -- which is the correct number. (define nonzer-set (make-atom-set)) (for-each (lambda (SH) (define rc (sup 'left-count SH)) (define erc (inexact->exact rc)) (if (not (equal? 0 erc)) (nonzer-set SH))) (cog-get-atoms 'ShapeLink)) (define nonzer (nonzer-set #f)) (length nonzer) ; 838307 (cog-close (RocksStorageNode "rocks:///home/ubuntu/data//r10-singles.rdb")) (cog-open (RocksStorageNode "rocks:///home/ubuntu/data//r10-bad-shapes.rdb")) (define bsh (Anchor "bad shape")) (define bxs (Anchor "bad cross")) (fetch-incoming-set bsh) (define bad-set (make-atom-set)) (for-each (lambda (MEM) (bad-set (gar MEM))) (cog-incoming-set bsh)) (define baddies (bad-set #f)) (length baddies) ; 27534 -- yes OK. (define bb (atoms-subtract baddies nonzer)) (length bb) ; 0 so all the baddies are non-zero, this is correct. (define afa (Anchor "affected")) (define affect (make-atom-set)) (for-each (lambda (MEM) (affect (gar MEM))) (cog-incoming-set afa)) (define afd (affect #f)) (length afd) ; 120354 (define ba (atoms-subtract baddies afd)) (length ba) ; 0 so all baddies should have been stored ... (define orp (Anchor "orphan")) (define orphans (make-atom-set)) (for-each (lambda (MEM) (orphans (gar MEM))) (cog-incoming-set orp)) (define ors (orphans #f)) (length ors) ; 40751 (define bo (atoms-subtract baddies ors)) (length bo) ; 27534 -- none of the baddies are orphans (define gone 0) (for-each (lambda (badj) (if (not (cog-atom? badj)) (set! gone (+ 1 gone)))) baddies) (define WRD-LIST (cons (WordClassNode "+ — “ ” _") (list (WordNode "+") (WordNode "—") (WordNode "“") (WordNode "”") (WordNode "_")))) (remove-all-empty-sections star-obj WRD-LIST) (for-each (lambda (badj) (if (not (cog-atom? badj)) (set! gone (+ 1 gone)))) baddies) gone is zero... (delete-orphans star-obj '() ors) gone is still zero... (define zer 0) (for-each (lambda (SH) (define rc (sup 'left-count SH)) (define erc (inexact->exact rc)) (if (equal? 0 erc) (set! zer (+ 1 zer))) ) baddies) zer ; 0 So why aren't baddies .... stored ??? (cog-open (RocksStorageNode "rocks:///home/ubuntu/data//r10-singles.rdb")) Is it the same set each time? (define nk (PredicateNode "*-Norm Key cover-section")) (define nkc 0) (define miss 0) (define zk 0) (define (check-key atm) (when (not (cog-atom? atm)) (format #f "No more atom\n") ) (define ov (cog-value atm nk)) (cog-set-value! atm nk #f) (fetch-atom atm) (when (equal? 0 (length (cog-keys atm))) (set! zk (+ 1 zk))) (when (not (equal? 1 (length (cog-keys atm)))) (set! nkc (+ 1 nkc)) ; (format #t "no keys in storage!\n") ) (define nv (cog-value atm nk)) (if (not (equal? ov nv)) (set! miss (+ 1 miss))) (cog-set-value! atm nk ov) ) (for-each check-key baddies) nkc ; 27534 miss ; 27534 and they're still all bad... (for-each store-atom baddies) (for-each check-key baddies) now its good. So what's going on? (define std (story #f)) (length std) ; 79603 affected: 120354 orphans: 40751 should be stored: 79603 (for-each check-key std) nkc ; 27534 miss ; 27534 zk ; 27534 so the key is missing! ... in storage ... (define sup (add-support-api star-obj)) (define nk 0) (define zer 0) (for-each (lambda (SH) (define rc (sup 'left-count SH)) (define erc (inexact->exact rc)) (if (equal? 0 erc) (set! zer (+ 1 zer))) (when (not (equal? 1 (length (cog-keys SH)))) (set! nk (+ 1 nk)))) std) nk ; 0 zer ; 0 backend bug: store is being called, key is not being stored. !? all zero: notone miscomp zerok but (for-each check-key std) still shows fails. duuu post-mmt check len 79603 not-one 0 miscomp 0 zoer-key 0 duuu post-remove emtpies len 79603 not-one 27534 miscomp 27534 zoer-key 27534 (recheck "post-remove emtpies") QueryLink TypedVariableLink car is VariableNode duuu rmx xst len 79603 not-one 1 miscomp 1 zoer-key 1 duuude ins=42 fail at 0.000 * [You, ] dflt-delete scm/opencog/persist.scm _sn->remove_atom(h, false); api/StorageNode.cc removeAtom followed by getAtomSpace()->extract_atom(h, recursive) in AtomTable.cc removeAtom in backing store. rocks does this unconditionally. w/o checking. if (handle->markForRemoval()) return false; should be true ? Wow. Fixed at last!? ---------------------------------------------------------- 1 + — “ ” _ 25 "much well good little long" (define sup (add-support-api star-obj)) (sup 'total-count-left) ; 0.0 (define asc (add-support-compute star-obj)) (asc 'set-left-totals) still borken (star-obj 'clobber) (asc 'set-left-totals) still borken (define lb (star-obj 'left-basis)) (define lc (map (lambda (itm) (sup 'right-count itm)) lb)) (asc 'cache-all) Finished left norm marginals in 2187 secs (sup 'total-support-left) ; still zero ! (define w (car lb)) (star-obj 'right-stars w) (for-each (lambda (s) (format #t "~A\n" (prt-element s))) (star-obj 'right-stars w)) (sup 'right-count w) ; 2855.0 well, that seems plausible, so wtf. so cache-all fixed the marginals.... (asc 'total-support-left) ; 2196945.0 .. well wrong but at least not zero. (asc 'set-left-totals) (sup 'total-support-left) ; 2196945.0 wtf so why did cache-all not work? all-left-marginals set-left-marginals (right-basis) ... right-basis==DJ do-left-totals ... (set-left-marginals COL) ; COL==DJ (sum-left-norms COL) (star-obj 'left-stars COL) ; left-stars = (*, DJ) api-obj 'set-left-norms COL left-count --> for DJ right-count -- for W do-left-totals ... compute-total-support-from-left loop over 'left-basis (i.e. words) ... fixed. ---------------- (define in-grp (list (Word "could") (WordClass "must would") (WordClass "might should will may") (Word "can") (Word "shall"))) (define wclass (WordClass "could must would might should will may can shall")) (define merge-majority (make-merge-majority star-obj 0.5 4 #t)) (merge-majority wclass in-grp) ;;;; (recompute-marginals LLOBJ (cons wclass in-grp)) (star-obj 'clobber) (define LLOBJ star-obj) (define WRD-LIST (cons wclass in-grp)) (define affected-basis (get-affected-basis LLOBJ WRD-LIST)) (define wrd-list (first affected-basis)) (define dj-list (second affected-basis)) (length dj-list) ; 53367 (length wrd-list) ; 2021 ;;;; (define orphans (recompute-mmt LLOBJ wrd-list dj-list)) (length (dj-orphan #f)) ; 21667 (length (wrd-orphan #f)) ; 2 ;; (WordClass "might should will may") (WordClassNode "must would") (remove-all-empty-sections LLOBJ WRD-LIST) ;;; (delete-orphans LLOBJ (dj-orphan #f) (wrd-orphan #f)) (define wm (car (wrd-orphan #f))) ; this is right-wild for l-atom (EvaluationLink (PredicateNode "*-Direct Sum Wild (gram-class⊕cross-section)") (WordClassNode "might should will may") (AnyNode "right-wild-direct-sum")) (LLOBJ 'left-element wm) (PredicateNode "*-Direct Sum Wild (gram-class⊕cross-section)") Oh no!!!! (EvaluationLink pred-node left-wnode R-ATOM) (if (and (not disjoint-left) (equal? 'EvaluationLink (cog-type pair)) (equal? pred-node (gar pair))) (gdr pair) ' left-element --> get-pair-left FIXED! Yayyy! fix direct sum wildcard handling -------------------------------------- (recomp-all-sim LLOBJ WX) 0.5-0.2 -- 19008 0.6-0.3 -- 19009 0.7-0.4 -- 19006 0.8-0.5 -- 19012 COGSERVER_CONF=/home/ubuntu/experiments/run-6/4-cogserver/cogserver-gram-en.conf (dump-log "/tmp") (dump-merges "/tmp") (cog-set-value! (AnchorNode "data logger") (Predicate "quorum-comm-noise") (FloatValue 0.8 0.5 4 200)) (define port (open "/tmp/log-0.5-0.2.dat" (logior O_CREAT O_WRONLY))) (define port (open "/tmp/log-0.6-0.3.dat" (logior O_CREAT O_WRONLY))) (define port (open "/tmp/log-0.7-0.4.dat" (logior O_CREAT O_WRONLY))) (define port (open "/tmp/log-0.8-0.5.dat" (logior O_CREAT O_WRONLY))) (print-log port) (close port) frqobj 'pair-entropy set-left-wild-entropy (define frq (add-pair-freq-api star-obj)) (frq 'right-wild-fentropy (Word "the")) cache-right-entropy no pair entropies available... (WordNode "king") (define cfr (make-compute-freq star-obj)) (cfr 'init-freq) (cfr 'cache-all) 2777968 -- correct (define ent-obj (add-entropy-compute star-obj)) (ent-obj 'cache-all-right-entropy) (ent-obj 'right-entropy) ; 8.514785640953038 (define rpt-obj (add-report-api star-obj)) (rpt-obj 'left-entropy) wtf.... report says left-entropy -- The sum H_left = -sum_x P(x,*) log_2 P(x,*) that's wrong .. .fixed. but ent says H_left = sum_y P(*,y) log_2 P(*,y) left-sum = sum_y f(y) sum_y 'left-wild-freq y where y==right side of pair. and left-wild=(*,y) Hang on ... right-support = sum_x P(x,*) |(x,*)| right-count = sum_x P(x,*) N(x,*) h_left(y) = -sum_x P(x,y) log_2 P(x,y) h_right(x) = -sum_y P(x,y) log_2 P(x,y) -32 15112 970910 22942644 22942644 2144763 12.739998 16.073733 6.5032292 10.610815 -33 15113 970112 22942644 22942644 2136119 12.744734 16.063484 6.4864505 10.606281 wtf ... older vs newer code ... self-sim is missing make-class-logger log-class mu=2.3 sigma=0.55 "self-mi-hist.dat" using 2:(exp(-(log($2)-mu)**2/(2*sigma**2))/($2 * sigma * sqrt(2*3.14159))) with lines lw 2 title "N(2.3,0.55)", \ (fold (lambda (W S) (+ S (frq-obj 'right-wild-freq W))) 0 all-words) correct ... (fold (lambda (C S) (+ S C)) 0 (array->list (list-ref bin-wei-went 1))) (fold (lambda (W S) (+ S (* (frq-obj 'right-wild-freq W) (frq-obj 'right-wild-fmi W)))) 0 all-words) (fold (lambda (W S) (+ S (frq-obj 'right-wild-mi W))) 0 all-words) (fold (lambda (C S) (+ S C)) 0 (array->list (list-ref bin-wei-wmi 1))) (fold (lambda (C W S) (+ S (* C W))) 0 (array->list (list-ref bin-wei-wmi 1)) (array->list (list-ref bin-wei-wmi 0))) 8.5148 19.463 (define SIM-ID "shape-mi") (define sap (add-similarity-api LLOBJ #f SIM-ID)) (define smi (add-symmetric-mi-compute LLOBJ)) (define ol2 (/ 1.0 (log 2.0))) (define (log2 x) (if (< 0 x) (* (log x) ol2) -inf.0)) (define mmt-q (smi 'mmt-q)) (define (compute-sim WA WB) (define fmi (smi 'mmt-fmi WA WB)) (define mwa (smi 'mmt-marginal WA)) (define mwb (smi 'mmt-marginal WB)) (define qmi (+ fmi (* 0.5 (log2 (* mwa mwb))))) (define rmi (+ qmi mmt-q)) (store-atom (sap 'set-pair-similarity (sap 'make-pair WA WB) (FloatValue fmi rmi qmi)))) (define WA (Word "dog")) (define WB (Word "cat")) fmi = 3.947147405113547 mwa = 1.1695792378830594e-4 (log2 mwa) -13.061722773328585 (log2 mwb) -14.876559284766689 mmt-q 11.945777087600217 (+ fmi (* 0.5 (+ (log2 mwa) (log2 mwb)))) = -10.02199362393409 (+ mmt-q = 1.9237834636661262 (define frq-obj (add-pair-freq-api star-obj)) (frq-obj 'right-wild-fentropy WA) ; 21.17375559818631 (frq-obj 'right-wild-fentropy WB) ; 21.36729919924388 (frq-obj 'right-wild-fmi WA) ; 7.035118871435061 (frq-obj 'right-wild-fmi WB) ; 8.800124255097883 (sap 'pair-count WA WB) ------------- (SimilarityLink . 280094) for 0.7 ----------------- mmt-total = (trans-obj 'total-mmt-count) mmt-q = (- (log2 mmt-total (* tcl tcl)) tcl = (sup-obj 'total-count-left) right-product = sum_d N(w,d) N(u,d) ------ Start merge 282 with seed pair `had` and `been` Initial in-group size=2: `been` `had` In-group size=2 overlap = 1 of 3178 disjuncts, commonality= 0.03% ------ merge-majority: Merge 1107 sections in 0 secs ------ merge-majority: Remaining 1864 cross in 2 secs 198 for 0.7 >>C —.i<< vote-thresh == 2. There must be ... something, cause MI so high! (define vote-thresh 2) (define WLIST (list (Word "C") (Word "—"))) (define LLOBJ star-obj) (define (vote-to-accept? DJ) (<= vote-thresh (fold (lambda (WRD CNT) (if (nil? (LLOBJ 'get-pair WRD DJ)) CNT (+ 1 CNT))) 0 WLIST))) (define cseq (filter (lambda (SEC) (equal? (cog-type SEC) 'ConnectorSeq)) dj-list)) (any (lambda (DJ) (vote-to-accept? DJ)) cseq) #f (any (lambda (DJ) (vote-to-accept? DJ)) dj-list) #f (define okl (filter vote-to-accept? dj-list)) (length okl) ; 1 !! left-over calc is wrong. It knocked out unvoted sections... update-memb-count change do-erge to return non-zero.. and fold it? in merge-dj DONE find-in-group optimal-in-group -- change to use plain MI. DONE How about voting? vote-to-accept? accepted is DJ is shared by majority. recompute-entropies -- defer ... num-classes (cog-count-atoms 'WordClassNode)) DONE diag extension size. (num simil) (cog-count-atoms 'Sim) DONE total left, right entropy via report-obj DONE self-mi of new class DONE size of class DONE number of disjuncts on class (support) DONE count via marginal count. DONE entropy of class and logli DONE count on class via member link, and merge entropy DONE number of disjuncts merged ? fraction? -- available as class support minus delta size of matrix. use count avoid rename log-stuff DONE why is overlap tiny, but num actually merged huge? count-shared-conseq ... whuy isn't this same as vote? make-group-similarity 'noise-col-supp (define wlist (list (Word "little") (Word "young"))) (count-shared-conseq star-obj 0.5 4 wlist) ; (107 1311) (count-shared-conseq star-obj 0.5 0 wlist) ; (355 4109) (define vote-thresh 2) (define LLOBJ star-obj) (define wlist (list (Word "greatest") (Word "south"))) (define voter-list wlist) (define WLIST wlist) (length dj-list) ; 4109 ... that agrees. (fold (lambda (DJ SUM) (if (vote-to-accept? DJ) (+ SUM 1) SUM)) 0 dj-list) 355 OK so that's right too. Initial in-group size=2: `greatest` `South` In-group size=2 overlap = 2 of 100 disjuncts, commonality= 2.00% In-group size=2: `greatest` `South` ------ merge-majority: Merge 40 of 91 sections in 0 secs ------ merge-majority: Remaining 42 of 91 cross in 0 secs (define wlist (list (Word "greatest") (Word "south"))) (count-shared-conseq star-obj 0.5 4 wlist) ; (1 100) (count-shared-conseq star-obj 0.5 0 wlist) ; (3 387) (length dj-list) ; 387 correct (fold (lambda (DJ SUM) (if (vote-to-accept? DJ) (+ SUM 1) SUM)) 0 dj-list) ; 3 expected (define (below-floor? DJ) (any (lambda (WRD) (define SEC (star-obj 'get-pair WRD DJ)) (and (not (nil? SEC)) (< (cog-count SEC) 4))) wlist)) (fold (lambda (DJ SUM) (if (or (vote-to-accept? DJ) (below-floor? DJ)) (+ SUM 1) SUM)) 0 dj-list) ; 243 wow that's a lot of noisey bits. sum_i n_i/N log n_i/N = (1/N) sum_i n_i (log n_i - log N) = (1/N) sum_i n_i log n_i - ((log N) /N) sum_i n_i = (1/N) sum_i n_i log n_i - log N -------------------------------------------------------- n4 Throw to key `bad-summation' with args `(compute-total-entropy "Left and right entropy sums fail to be equal: 19.363223854751855 19.295559903799987\n")'. after the reorder, fail to be equal: 19.382658536329917 19.463229535208917\n")'. and this time, hits on the first one. WTF. oh, need to do pairs, too Really? Cause a global recompute was enough to fix it. Dong apirs: 19.607278945916782 19.687849944789978 (define fra (add-pair-freq-api star-obj)) (define frc (make-compute-freq star-obj)) (frc 'init-freq) (define ol2 (- (/ 1 (log 2)))) (define (neq A B) (< 1.0e-12 (abs (- A B)))) (define ftot 0) (define htot 0) (define nelt 0) (for-each (lambda (PR) (set! nelt (+ 1 nelt)) (define ofrq (fra 'pair-freq PR)) (define nfrq (frc 'compute-pair-freq PR)) (when (neq ofrq nfrq) (format #t "frq foo not ~A ~A ~A ~A\n" nelt ofrq nfrq (prt-element PR)) (foobar)) (set! ftot (+ ftot ofrq)) (define oli (fra 'pair-logli PR)) (define nli (* ol2 (log ofrq))) (when (neq oli nli) (format #t "badli not ~A ~A ~A ~A\n" nelt oli nli (prt-element PR)) (foobar)) (define oen (fra 'pair-entropy PR)) (define nen (* oli ofrq)) (when (neq oen nen) (format #t "badent not ~A ~A ~A ~A\n" nelt oen nen (prt-element PR)) (foobar)) (set! htot (+ htot oen)) ) (star-obj 'get-all-elts)) nelt ; 2774729 ftot ; 1.0000000000298555 htot ; 19.45924890334887 vs initial Entropy Total: 19.463 conclude: pairs are good. do it again: nelt ; 2774729 ftot ; 1.0000000000298555 Heh no change. htot ; 19.45924890334887 no change (define enc (add-entropy-compute star-obj)) (define ndj 0) (define hdj 0) (for-each (lambda (DJ) (set! ndj (+ 1 ndj)) (define ofr (fra 'left-wild-freq DJ)) (define nfr (frc 'compute-left-freq DJ)) (when (neq ofr nfr) (format #t "bad freq ~A ~A ~A ~A\n" ndj ofr nfr (prt-dj DJ)) (foobar)) (define ohm (fra 'left-wild-entropy DJ)) (define nhm (enc 'compute-left-entropy DJ)) (when (neq ohm nhm) (format #t "bad marg ent ~A ~A ~A ~A\n" ndj ohm nhm (prt-dj DJ)) (format #t "the freq=~A\n" ofr) (format #t "the pairs: ~A\n" (prt-element-list (star-obj 'left-stars DJ))) (format #t "dj ~A\n" DJ) (foobar)) (set! hdj (+ hdj ohm)) ) (star-obj 'right-basis)) Merged into `+ — ” _ ) [` in 108 secs bad marg ent 25 4.088722873782772e-6 2.0443614368913863e-6 the freq=8.717391073147455e-8 the pairs: 2.000 * [{+ — ” _ ) [}, ] So old is exactly 2x too big ... (/ 2.0 22942644.0) ; 8.717391073147455e-8 OK (/ (log 8.717391073147455e-8) (log 2)) ; 23.451528326963775 (* 23.451528326963775 8.717391073147455e-8) ; 2.0443614368913867e-6 OK Who calls set-freq on the dj? because its' misleading. It's cache-left-freq calls set-left-wild-freq calls set-freq which is wrong-ish. using the freq-key ... whatever no harm. set-entropy uses entr-name entropy-key called by set-left-wild-entropy (define fdj (ShapeLink (WordNode "What’s") (Connector (WordNode "###LEFT-WALL###") (ConnectorDir "-")) (Connector (WordNode "“") (ConnectorDir "-")) (Connector (WordNode "she") (ConnectorDir "+")) (Connector (WordNode "?") (ConnectorDir "+")) (Connector (VariableNode "$connector-word") (ConnectorDir "+"))) ) ah hah! (define ca (first (star-obj 'left-stars fdj))) (define cb (second (star-obj 'left-stars fdj))) (for-each (lambda (PR) (frc 'cache-pair-freq PR)) (star-obj 'left-stars fdj)) (fra 'pair-freq ca) is 8.717391073147455e-8 good (fra 'pair-freq cb) is not zero .. bad! (frc 'compute-pair-freq cb) ; gives zero... goood. (frc 'cache-pair-freq cb) ; gives #f (fra 'pair-freq cb) ; is not set. Ah hah! *) cache-pair-freq doesn't set the freq if its zero. if it did, then log would be nan FIXED *) compute-left-freq does not neeed pair freq so OK. *) compute-left-entropy need pair entropy and doesn't check for nan finite? ---------- Now it passes: ndj 1045828 hdj 19.459248903090117 -- equals htot, good. error was: ums fail to be equal: 19.459248903090117 19.539819901963327\n")'. So left checks out. (define (neq A B) (< 1.0e-12 (abs (- A B)))) (define nwr 0) (define hwr 0) (for-each (lambda (WR) (set! nwr (+ 1 nwr)) (define ofr (fra 'right-wild-freq WR)) (define nfr (frc 'compute-right-freq WR)) (when (neq ofr nfr) (format #t "bad freq ~A ~A ~A ~A\n" nwr ofr nfr (cog-name WR)) (foobar)) (define ohm (fra 'right-wild-entropy WR)) (define nhm (enc 'compute-right-entropy WR)) (when (neq ohm nhm) (format #t "bad marg ent ~A ~A ~A ~A\n" nwr ohm nhm (cog-name WR)) (format #t "the freq=~A\n" ofr) ; (format #t "the pairs: ~A\n" (prt-element-list (star-obj 'right-stars WR))) (format #t "word ~A\n" WR) (foobar)) (set! hwr (+ hwr ohm)) ) (star-obj 'left-basis)) (WordNode "king") bad marg ent 1 0.0025159878286492524 0.00250798473072315 king (length (star-obj 'right-stars (WordNode "king"))) ; 380 (define wrd (WordNode "king")) (enc 'compute-right-entropy wrd) ; 0.00250798473072315 as printed. (fra 'right-wild-freq wrd) ; 1.244407575691799e-4 fent ; 20.154045826415793 Maybe wrd is not in the list? Seems like that's why.... so ... in short: every affected dj has a word that needs recomputation. Ouch. TODO: what if requency is zero? FIXED (define wl (list (Word "+") (Word "—") (Word "”") (Word "_") (Word ")") (Word "[") (WordClass "+ — ” _ ) ["))) (define e (make-elapsed-secs)) (define affected-basis (get-affected-basis star-obj wl)) (define wrd-list (first affected-basis)) (define dj-list (second affected-basis)) (format #t "------ Find affected basis of (~A, ~A) in ~A secs\n" (length wrd-list) (length dj-list) (e)) (define kin (Word "King")) (any (lambda (WR) (equal? WR kin)) wrd-list) #t so it should have been found.... (define fra (add-pair-freq-api star-obj)) (define frc (make-compute-freq star-obj)) (define enc (add-entropy-compute star-obj)) (frc 'init-freq) (define kin (Word "King")) (fra 'right-wild-entropy kin) (enc 'compute-right-entropy kin) before: 0.0032419551623822907 should be: 0.0032298633415824055 ------ Start merge 5 with seed pair `and` and `but` In-group size=6: `but` `and` `as` `for` `or` `than` ------ merge-majority: Merge 19166 of 26705 sections in 40 secs ------ merge-majority: Remaining 43890 of 60278 cross in 118 secs ------ Start merge 12 with seed pair `and` and `but` In-group size=6: `but` `and` `for` `by` `from` `upon` ------ merge-majority: Merge 5515 of 13776 sections in 17 secs ------ merge-majority: Remaining 9448 of 25904 cross in 41 secs ------ Start merge 14 with seed pair `and` and `but` In-group size=3: `but` `and` `for` ------ merge-majority: Merge 533 of 7425 sections in 5 secs ------ merge-majority: Remaining 1490 of 14731 cross in 17 secs And the old code crashes with this: ------ Round 830 Next in line: ranked-MI = 2.9873 MI = 7.8775 (`Notes`, `“`) ------ Start merge 830 with seed pair `Notes` and `“` Initial in-group size=2: `“` `Notes` In-group size=2 overlap = 0 of 3222 disjuncts, commonality= 0.00% In-group size=2: `“` `Notes` ------ merge-majority: Merge 0 of 828 sections in 1 secs ------ merge-majority: Remaining 0 of 2394 cross in 2 secs (recomp-all-sim _ #) conclude: The sime need to be redone installed into /usr/local/share/guile/site/3.0/ (define wli (list (Word "the") (Word "a") (Word "this"))) (for-each cog-delete! (cog-get-atoms 'Similarity)) (fcompute-diag-mi-sims star-obj wli 0 5) (define (blah) (for-each cog-delete! (cog-get-atoms 'Similarity)) (define e (make-elapsed-secs)) (compute-diag-mi-sims star-obj wli 0 5) (format #t "done serial in ~A secs\n" (e))) no fibers: done serial in 84 secs ; 50 seconds lost in getting the basis done serial in 29 secs done serial in 23 secs done serial in 31 secs done serial in 28 secs done serial in 30 secs done serial in 29 secs w 6 fibers: -- conclude -- zero parallelism improvements done in 54 secs done in 35 secs done in 32 secs done in 37 secs done in 36 secs w 1 fibers 3x CPU use but no speedup. done in 28 secs done in 29 secs done in 89 secs ; wtf done in 55 secs ; ????? done in 164 secs ; ??? done in 51 secs done in 35 secs two fibers: done in 34 secs done in 33 secs Cheesy-thread: (define (cheese) (for-each cog-delete! (cog-get-atoms 'Similarity)) (define e (make-elapsed-secs)) (tcompute-diag-mi-sims star-obj wli 0 5) (pool-join) (format #t "done cheese in ~A secs\n" (e))) done cheese in 14 secs done cheese in 13 secs done cheese in 14 secs done cheese in 13 secs Simple thread pool is a winner. At least 2x speedup. (define wli (list (Word "the") (Word "a") (Word "this") (Word "that"))) done cheese in 18 secs vs done serial in 37 secs -------------------------------------------------------- log noise - done before DONE log merge style DONE stop printing MI DONE change Diag: cycle to 30 s DONE -------------------------------------------------------- n=4 impr 7 "? . , !" 2.0 8 "#f" 3.0 9 "#f" 2.0 10 "They There" 2.0 27 "be have" 2.0 28 "has had have" 1.0 29 "Mr A No" 1.0 30 ": ###LEFT-WALL### _" 3.0 ------ Start merge 8 with seed pair `It` and `He` In-group size=3: `He` `It` `There` ------ Merged into `He It There` in 23 secs WTF why is class node not reported!? ------ Start merge 9 with seed pair `She` and `He It There` In-group size=3: `He It There` `She` `They` ------ Merged into `He It There She They` in 28 secs Ingroup off-by-one.... worse, off by miscounted size of class. ------ Start merge 28 with seed pair `have` and `has had` In-group size=2: `has had` `have` ------ Start merge 29 with seed pair `A No` and `Mr` In-group size=2: `Mr` `A No` So miscounting class contents (define mrg (make-merge-majority star-obj 0.7 4)) (mrg (WordClass "A No") (list (Word "A") (Word "No"))) (cog-incoming-by-type (WordClass "A No") 'MemberLink) (mrg (WordClass "Mr A No") (list (Word "Mr") (WordClass "A No"))) (cog-incoming-by-type (WordClass "Mr A No") 'MemberLink) -------------------------------------------------------- 300000 done in 8 secs; inserting into : TITDE- & TITFM+; Finished inserting 319740 records Throw to key `fail-insert' with args `(make-db-adder "UNIQUE constraint failed: Morphemes.subscript")'. INSERT INTO Morphemes VALUES ('LEFT-WALL', 'LEFT-WALL', '<###LEFT-WALL###>'); (define dbo (dbi-open "sqlite3" "/tmp/dict.db")) (dbi-query dbo "SELECT * FROM Morphemes WHERE morpheme='LEFT-WALL';") sqlite> SELECT * FROM Morphemes WHERE morpheme='LEFT-WALL'; LEFT-WALL|LEFT-WALL|<###LEFT-WALL####uni> (WordClassNode "###LEFT-WALL####uni") to spinnig disk: slowww -- 10846 secs or (/ 378937 10846.) 35 recs/sec to SSD: Finished inserting 378937 records in 34699 secs (10.921/sec) wtf even slower. -------------------------------------------------------- (nobs (WordNode "the")) 548680.0 where are the memberlinks? holy cow completely forgot to do this. add-gram-class-api (WordClassNode . 502) (add-wordclass-filter singletons is just .. broken. add-word-class add-germ-cset-pair add-section export-csets - everything in the object (define (esc-q STR) (string-concatenate (map (lambda (CHAR) (cond ((equal? CHAR #\") "U+0022") ((equal? CHAR #\\) "U+005C") (else (list->string (list CHAR))))) (string->list STR)))) (add-shape-vec-api 'flatten-section CLS SECT Ugh. (define cset-obj (make-pseudo-cset-api)) (define covr-obj (add-covering-sections cset-obj)) (define gram-obj (add-gram-class-api covr-obj)) (gram-obj 'fetch-pairs) (gram-obj 'explode-sections) (define sgl-obj (add-singleton-classes gram-obj)) if (not (LLOBJ 'provides 'flatten)) explode for each sigleton in right-stars(singleton) flatten cp -pr r10-mrg-q0.7-c0.4.rdb r10-export.rdb cog-inc-count! (sup 'right-count WRD) (define sup (add-support-api LLOBJ)) star-wild After trimming, 2592 words left, out of 15781 After trimming, 2247 words left, out of 15781 Done computing 441274 pair MI's in 263 secs Finished with MI computations; this took 0.189 hours. Done storing 137361 left-wilds in 47 secs Done storing 15447 right-wilds in 5 secs ; oh no.... want only classes. Will store 441274 csets ??? sections Finished inserting 12082 records in 11 secs (1098.4/sec) cword-list-to-lg-con-list (cog-incoming-by-type WRD-OR-CLA 'MemberLink) make-conseq-predicate - DONE add-linking-filter - DONE linking-trim - DONE Support: found num left= 2622 num right= 136044 in 0 secs Total count N(*,*) = 7779650.0 = 7779650.0 (define (is-word-class? ITEM) (eq? 'WordClassNode (cog-type ITEM))) (linking-trim star-obj is-word-class?) cog-incoming-size-by-type atom (TypeChoice generic-trim pred return T to delete Created 2247 singleton word classes in 289 secs ; (define gcf (add-word-remover star-obj #t) ; -- no (define (is-word-class? ITEM) (eq? 'WordClassNode (cog-type ITEM))) (linking-trim star-obj is-word-class?) (for-each cog-delete! (cog-get-atoms 'ConnectorSeq)) ; zap orphan seqs ; zap all connector-seq marginals (for-each (lambda (CSQ) (if (eq? 1 (cog-incoming-size CSQ)) (cog-delete-recursive! CSQ))) (cog-get-atoms 'ConnectorSeq)) (for-each cog-delete! (cog-get-atoms 'Connector)) (for-each cog-delete! (cog-get-atoms 'ListLink)) ???? (for-each cog-delete! (cog-get-atoms 'WordNode)) Need to recompute marginals? (for-each (lambda (WRD) (if (eq? 1 (cog-incoming-size WRD)) (cog-delete-recursive! WRD))) (cog-get-atoms 'WordNode)) WTF: before trim: (cog-report-counts) ((PredicateNode . 36) (ListLink . 433714) (MemberLink . 3184) (AnyNode . 7) (Connector . 18662) (ConnectorDir . 2) (ConnectorSeq . 321820) (Section . 570889) (EvaluationLink . 15781) (TypeNode . 6) (TypeChoice . 3) (AnchorNode . 1) (SchemaNode . 1) (RocksStorageNode . 1) (WordNode . 15082) (WordClassNode . 3172)) Trimmed all pairs in 168 seconds. Trimmed right basis in 1 seconds. Trimmed left basis in 0 seconds. ((PredicateNode . 36) (ListLink . 433714) (MemberLink . 3184) (AnyNode . 7) (Connector . 18662) (ConnectorDir . 2) (ConnectorSeq . 321790) (Section . 367903) (EvaluationLink . 15781) (TypeNode . 6) (TypeChoice . 3) (AnchorNode . 1) (SchemaNode . 1) (RocksStorageNode . 1) (WordNode . 15082) (WordClassNode . 3172)) Finally ((PredicateNode . 36) (MemberLink . 3184) (AnyNode . 7) (Connector . 5150) (ConnectorDir . 2) (ConnectorSeq . 63384) (Section . 297783) (EvaluationLink . 3105) (TypeNode . 6) (TypeChoice . 3) (AnchorNode . 1) (SchemaNode . 1) (RocksStorageNode . 1) (WordNode . 2406) (WordClassNode . 3172)) Total count N(*,*) = 6521715.0 = 6521715.0 Done computing 297783 pair frequencies in 219 secs Done computing 63384 left-wild log frequencies in 36 secs. Done computing 2587 right-wild log frequencies in 1 secs. Done computing 297783 pair MI's in 985 secs Finished left entropy subtotals in 3241 secs ; wtf ????? Finished right entropy subtotals in 250 secs Finished with MI computations; this took 2.438 hours. Will store 297783 csets Finished inserting 59352 records in 1156 secs (51.343/sec) (WordNode "pp") (Connector (WordNode "pp") (ConnectorDir "-")) (Connector (WordNode "pp") (ConnectorDir "+")) (for-each (lambda (WC) (if (eq? 'WordNode (cog-type WC)) (foobar))) (star-obj 'left-basis)) ((PredicateNode . 39) (ListLink . 58667) (MemberLink . 3184) (AnyNode . 7) (Connector . 665) (ConnectorDir . 2) (ConnectorSeq . 6790) (Section . 67564) (EvaluationLink . 2760) (TypeNode . 6) (TypeChoice . 3) (AnchorNode . 1) (SchemaNode . 1) (RocksStorageNode . 1) (WordNode . 2406) (WordClassNode . 3172)) (for-each (lambda (WRD) (when (eq? 1 (cog-incoming-size WRD)) (cog-delete! (star-obj 'right-wildcard WRD)) (cog-delete! WRD))) (cog-get-atoms 'WordNode)) ((PredicateNode . 39) (ListLink . 58667) (MemberLink . 3184) (AnyNode . 7) (Connector . 665) (ConnectorDir . 2) (ConnectorSeq . 6790) (Section . 67564) (EvaluationLink . 2756) (TypeNode . 6) (TypeChoice . 3) (AnchorNode . 1) (SchemaNode . 1) (RocksStorageNode . 1) (WordNode . 2402) (WordClassNode . 3172)) (batch-all-pair-mi star-obj) Total count N(*,*) = 2455128.0 = 2455128.0 Done storing 6790 left-wilds in 2 secs Done storing 2363 right-wilds in 1 secs Done computing 67564 pair MI's in 28 secs Finished with MI computations; this took 0.021 hours. (ConnectorSeq . 232935) ... wtf (star-obj 'left-basis-size) ; 15447 (star-obj 'right-basis-size) ; 197666 (ConnectorSeq . 321849) ... wtf ... (- 321849 197666) ; 124183 (define djs (make-aset-predicate (star-obj 'right-basis))) (for-each (lambda (CS) (when ;(eq? 1 (cog-incoming-size CS)) (not (djs CS)) (format #t "foo ~A" CS) (foobar))) (cog-get-atoms 'ConnectorSeq)) (star-obj 'right-basis)) .. the CS is in a ListLink ... (AnyNode "cset-word") so its a marginal ... but it's not in any sections?? Huh? Pre-explode: (star-obj 'left-basis-size) ; 15447 (star-obj 'right-basis-size) ; 873323 ... because this includes shapes! that somehow ended up begin stored!!!??? (ConnectorSeq . 321467) (ShapeLink . 675839) (+ 321467 675839) ; 997306 so bigger than then the basis. (- 997306 873323) ; 123983 similr to above ... wtf ... (Section . 570889) (ShapeLink . 675839) Many conseqs appear only in marginals! Gah.... dataset is not even self-consistent! /home2/linas/src/novamente/data/rocks-archive/run-1-t1234-tsup-1-1-1.rdb time cp -r run-1-t1234-tsup-1-1-1.rdb r12-trim.rdb (define cset-obj (make-pseudo-cset-api)) (cset-obj 'fetch-pairs) (define stars-obj (add-pair-stars cset-obj)) (check-linkability stars-obj) Found 5563 Words that cannot connect! Whoops! Again: run-1-t1234-tsup-1-1-1.rdb ((PredicateNode . 14) (ListLink . 220087) (AnyNode . 2) (Connector . 17090) (ConnectorDir . 2) (ConnectorSeq . 205003) (Section . 855718) (TypeNode . 3) (SchemaNode . 1) (RocksStorageNode . 1) (WordNode . 15083)) After: ((PredicateNode . 14) (ListLink . 204681) (AnyNode . 2) (Connector . 17060) (ConnectorDir . 2) (ConnectorSeq . 204680) (Section . 833833) (TypeNode . 3) (SchemaNode . 1) (RocksStorageNode . 1) (WordNode . 9495)) Repeated (add-support-api "There isn't any cached data on `cset` (len-obj 'total-support-left) do-left-totals called by (define marg (psu 'set-left-marginals DJ)) delete-orphans (star-obj 'left-wildcard cs) is ... cs. But make-pseudo-cset-api Its a csq not deleted ... must have come from ... rebuilt cross-section?? (ListLink (AnyNode "cset-word") (ConnectorSeq (Connector (WordNode "His") (ConnectorDir "-")) (Connector (WordNode "—") (ConnectorDir "+")))) (define csq (ConnectorSeq (Connector (WordNode "His") (ConnectorDir "-")) (Connector (WordNode "—") (ConnectorDir "+")))) (define wl (list (Word "+") (Word "—"))) (define merge-majority (make-merge-majority star-obj 0.8 4 #t)) (define wclass (make-class-node star-obj wl)) (merge-majority wclass wl) (define cwl (cons wclass wl)) (define affected-basis (get-affected-basis star-obj cwl)) (define adj (second affected-basis)) (length adj) (any (lambda (DJ) (equal? csq DJ)) adj) ; #t (define psu (add-support-compute star-obj)) (define sup (add-support-api star-obj)) (sup 'left-count csq) ; 20 (psu 'set-left-marginals csq) (sup 'left-count csq) ; 0.0 (< 0 (sup 'left-count csq)) ; #f (define dj-orphan (make-atom-set)) (define dj-store (make-atom-set)) (for-each (lambda (DJ) (define marg (psu 'set-left-marginals DJ)) (if (< 0 (sup 'left-count DJ)) (dj-store marg) (dj-orphan marg))) adj) (any (lambda (DJ) (equal? csq DJ)) (dj-orphan #f)) ; #f ... wtf (any (lambda (DJ) (equal? csq DJ)) (dj-store #f)) ; #f ... wtf (define dors (dj-orphan #f)) (length dors) ; 4939 (define cnt 0) (for-each (lambda (MRG) (if (eq? 'ShapeLink (cog-type MRG)) (set! cnt (+ 1 cnt)))) dors) ; 4471 (for-each (lambda (MRG) (when (not (eq? 'ShapeLink (cog-type MRG))) (format #t "its ~A" MRG) (foobar))) dors) Ta dah! -------------------------------------------------------- 0.5-0.2 -- 19008 0.6-0.3 -- 19009 0.7-0.4 -- 19006 0.8-0.5 -- 19012 (dump-log "/tmp") use git branch r10-mrg-fix for above Maybe 0.7-0.4 has enough members? Start up noise 0.6 0.7 0.8 precise n4 -- 19611 19711 19811 21811 (dump-log star-obj "/tmp/r11-p-log" print-log) (dump-log star-obj "/tmp/r11-p-cls" print-merges) imprecise imp-n4 -- 20411 imp-n3 -- 20311 imp-n2 -- 20211 imp-n1 -- 20111 (dump-log star-obj "/tmp/r11-log" print-log) (dump-log star-obj "/tmp/r11-cls" print-merges) export HOSTNAME=localhost export PORT=20112 export PROMPT="scheme@(run-12-imp1)" export COGSERVER_CONF=${CONFIG_DIR}/4-cogserver/cogserver-gram-i1.conf export MST_DB=${ROCKS_DATA_DIR}/r12-imp-q0.7-c0.2-n1.rdb HUGETLB_MORECORE=yes LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libhugetlbfs.so.0 AnonHugePages: 1228800 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB FileHugePages: 0 kB FilePmdMapped: 0 kB HugePages_Total: 32768 HugePages_Free: 28468 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB After loading and start of computation: AnonHugePages: 4624384 kB HugePages_Total: 32768 HugePages_Free: 7103 HugePages_Rsvd: 3 sudo sysctl vm.nr_hugepages=49152 AnonHugePages: 4745216 kB HugePages_Total: 49152 HugePages_Free: 22724 HugePages_Rsvd: 14 AnonHugePages: 9336832 kB ... 9211904 ... 22102016 ... 17385472 HugePages_Total: 49152 HugePages_Free: 21199 ... 21187 ... 20905 ... 20644 HugePages_Rsvd: 6 -------------------------------------------------------- r10-export-500.rdb built from junk dataset... singleton 500 == 67564 sections, 2524 WordClasses 2402 Words 665 connectors oh no! this.759 53 disjuncts is.1125 1 disjuncts a.979 6 disjuncts cup.2268 9 disjuncts (define cset-obj (make-pseudo-cset-api)) (define gram-obj (add-gram-class-api cset-obj)) (define star-obj (add-pair-stars gram-obj)) (star-obj 'fetch-pairs) (cog-incoming-size (Word "is")) ; 1 (cog-incoming-size (WordClassNode "is#uni")) ; 2 r11-imp-q0.7-c0.2-n1.rdb had 250 merges cp -pr r11-imp-q0.7-c0.2-n1.rdb r11-export-100.rdb (MemberLink . 1215) (WordClassNode . 144) (check-gram-dataset star-obj) (cleanup-gram-dataset star-obj) (MemberLink . 581) ; wow. Big drop. 66 orphan word classes (WordClassNode . 78) (define cov (add-covering-sections cset-obj)) (define sin (add-singleton-classes cov)) (cov 'fetch-pairs) (cov 'implode-sections) (sin 'create-hi-count-singles 100) After trimming, 5866 words left, out of 9573 Created 5866 singleton word classes in 1428 secs So that's a pretty big sample. (define pss (add-support-api LLOBJ)) .. wild-cards not fetched! (define cset-obj (make-pseudo-cset-api)) (define gram-obj (add-gram-class-api cset-obj)) (gram-obj 'fetch-pairs) (define cov (add-covering-sections cset-obj)) (cov 'fetch-pairs) (cov 'implode-sections) (check-gram-dataset cov) (cleanup-gram-dataset cov) ; above failed to load marginals! Again!? wtf? (cog-keys (cov 'right-wildcard (Word "is"))) (fetch-incoming-by-type pred-node 'EvaluationLink) Does NOT get the keys ... because there aren't any. So where are they? (EvaluationLink (PredicateNode "*-Direct Sum Wild (gram-class⊕cross-section)") (WordNode "is") (AnyNode "right-wild-direct-sum")) Not in the above... (cog-keys (gram-obj 'right-wildcard (Word "is"))) (ListLink (WordNode "is") (AnyNode "cset-disjunct")) Does have keys ... (define pss (add-support-api gram-obj)) (pss 'right-count (Word "is")) Does work. Why are we so confused, then? (MemberLink . 581) (WordClassNode . 78) (define sin (add-singleton-classes cov)) (sin 'create-hi-count-singles 100) --------------- (define cset-obj (make-pseudo-cset-api)) (define gram-obj (add-gram-class-api cset-obj)) (gram-obj 'fetch-pairs) (cog-keys (gram-obj 'right-wildcard (Word "is"))) (define pss (add-support-api gram-obj)) (pss 'right-count (Word "is")) OK, that looks good ... (check-gram-dataset gram-obj) (cleanup-gram-dataset gram-obj) OK, so that ... deleted the keys. Whoops! Calling (gram-obj 'fetch-pairs) doe NOT restore the keys! (fetch-atom (gram-obj 'right-wildcard (Word "is"))) also does not restore them ... because teh've been deleted from the DB already. (define star-obj (add-pair-stars gram-obj)) (define words (make-aset-predicate (star-obj 'left-basis))) (words (Word "is")) #t (define (have-keys) (not (nil? (cog-keys (gram-obj 'right-wildcard (Word "is")))))) (zap-word-marginals star-obj) (have-keys) ; stil OK (zap-conseq-marginals star-obj) (have-keys) ; stil OK (zap-word-marginals star-obj) (have-keys) ; stil OK (trim-linkage star-obj) (have-keys) ; #f ... so it was trimmed as unlinkable! ?really? (define is-in-connector? (make-linkable-pred star-obj)) (is-in-connector? (Word "is")) ; #t correct ... (define ok-conseq? (make-conseq-predicate star-obj is-in-connector?)) (define (good-elt? SECT) (and (is-in-connector? (star-obj 'left-element SECT)) (ok-conseq? (star-obj 'right-element SECT)))) (define cnt 0) (for-each (lambda (SCT) (when (not (good-elt? SCT)) (set! cnt (+ 1 cnt)))) (star-obj 'right-stars (Word "is"))) OK ... So what's hammering it? .... bug in add-trimmer. FIXED. -------------------- (define cset-obj (make-pseudo-cset-api)) (define gram-obj (add-gram-class-api cset-obj)) (gram-obj 'fetch-pairs) (cleanup-gram-dataset gram-obj) (cleanup-gram-dataset gram-obj) (check-gram-dataset gram-obj) (define cov (add-covering-sections gram-obj)) (define sin (add-singleton-classes cov)) (sin 'create-hi-count-singles 100) Now waht? (define pss (add-support-api gram-obj)) (pss 'right-count (Word "is")) ; 71894 (define pss (add-support-api cov)) (pss 'right-count (Word "is")) ; 0 (fetch-atom (cov 'right-wildcard (Word "is"))) (pss 'right-count (Word "is")) ; 52069 ... Oi. (define cset-obj (make-pseudo-cset-api)) (define cov (add-covering-sections cset-obj)) (cov 'fetch-pairs) (cov 'implode-sections) (check-gram-dataset cov) (cleanup-gram-dataset cov) (define sin (add-singleton-classes cov)) (sin 'create-hi-count-singles 100) After trimming, 5866 words left, out of 9573 Created 5866 singleton word classes in 287 secs (ListLink . 412454) (Connector . 17413) (ConnectorSeq . 216882) (Section . 615868) (MemberLink . 6447) (EvaluationLink . 9574) (WordNode . 9495) (WordClassNode . 5944) (define gcf (add-word-remover cov #t)) (batch-all-pair-mi gcf) Wrong! (gcf 'left-basis-size) ; 5944 (gcf 'right-basis-size) ; 618 wtf wrong (define (is-word-class? ITEM) (eq? 'WordClassNode (cog-type ITEM))) (define ok-conseq? (make-conseq-predicate cov is-word-class?)) (define cnt 0) (for-each (lambda (DJ) (when (ok-conseq? DJ) (set! cnt (+ 1 cnt)))) (cov 'right-basis)) cnt ; 618 argh. before starting flattening this is 570 (cog-incoming-size (WordClassNode "is#uni")) (define basis-word? (make-aset-predicate (cov 'left-basis))) (basis-word? (WordClassNode "is#uni") ; #t didn't get flattened... (Section (ctv 1 0 4) (WordClassNode "is#uni") (ConnectorSeq (Connector (WordNode "story") (ConnectorDir "-")) (Connector (WordNode "the") (ConnectorDir "+")))) Choices: 1) flatten everything every time 2) -- work with shapes -- if flatten returns not #f then cross/recross. make-cross-sections SEC make-section XES get-cross-sections get-section 544 weight (CrossSection (ctv 1 0 7) (WordNode "calculated") (ShapeLink (WordNode "for") (Connector ----------------- wtf (ConnectorDir "-")) (Connector (WordNode "the") (ConnectorDir "+")))) (define s (Section (WordNode "for") (ConnectorSeq (Connector (WordNode "calculated") (ConnectorDir "-")) (Connector (WordNode "the") (ConnectorDir "+"))))) (define cset-obj (make-pseudo-cset-api)) (define cov (add-covering-sections cset-obj)) (cov 'fetch-pairs) (check-gram-dataset cov) ; (cleanup-gram-dataset cov) (define sin (add-singleton-classes cov)) (sin 'create-hi-count-singles 100) Created 5866 singleton word classes in 5687 secs 5283374 OK cons Found 494190 unexpected ConnectorSeq! Found 129 Words that are not in Connectors! (ListLink . 120485) (MemberLink . 6447) (Connector . 26723) (ConnectorSeq . 629192) (Section . 2208614) (EvaluationLink . 8726) (WordNode . 9495) (WordClassNode . 5944) (define gcf (add-word-remover cov #t)) (gcf 'left-basis-size) ; 5944 (gcf 'right-basis-size) ; 206202 OK, I guess (batch-all-pair-mi gcf) Finished with MI computations; this took 0.269 hours. (define (is-word-class? ITEM) (and (cog-atom? ITEM) (eq? 'WordClassNode (cog-type ITEM)))) (linking-trim cov is-word-class?) Trimmed all pairs in 1604 seconds. Trimmed marginals+basis in 628 seconds. (Section . 720999) (ConnectorSeq . 252294) (WordNode . 9495) (WordClassNode . 5944) (MemberLink . 6447) (cleanup-gram-dataset cov) (Section . 717481) (ConnectorSeq . 199077) (WordNode . 5866) (WordClassNode . 5881) (MemberLink . 6249) again ... (cleanup-gram-dataset cov) (Section . 717035) (ConnectorSeq . 198772) (WordNode . 5837) (WordClassNode . 5877) (MemberLink . 6232) (print-matrix-summary-report cov) Rows: 15226 Columns: 998203 Size: 2076009.0 Total observations: 22942644.0 Entropy Total: 18.271 Left: 16.273 Right: 8.6970 Total MI: 5.5865 (cov 'explode-sections) (batch-all-pair-mi cov) Finished with MI computations; this took 2.549 hours. Rows: 5877 Columns: 933026 Size: 2348063.0 Total observations: 21856976.0 Entropy Total: 18.680 Left: 16.229 Right: 8.5677 Total MI: 6.1172 add-word-remover -> add-class-filter -------------------------------------- (define wall (WordNode "###LEFT-WALL###")) (define wallu (WordClassNode "###LEFT-WALL####uni")) (define wallc (WordClassNode ": ###LEFT-WALL### \"")) (first (cog-incoming-by-type wallu 'Section)) (Section (ctv 1 0 6) (WordClassNode "###LEFT-WALL####uni") (ConnectorSeq (Connector (WordClassNode "V#uni") (ConnectorDir "+")))) V parses V VI VI . (prt-element (list-ref (cog-incoming-by-type wallu 'Section) 0)) link-generator --count=2 --language=dict-en Program received signal SIGSEGV, Segmentation fault. ../../link-grammar/tokenize/tokenize.c:3012 3012 dn[dict->num_categories-1].right = NULL; print dict->num_categories cset-to-lg-dj called by add-germ-cset-pair Number of categories: 6 (define wallu (WordClassNode "###LEFT-WALL####uni")) (define wam (Connector wallu (ConnectorDir "-"))) (cog-incoming-size wam) ; 15317 OK `C A W H D S ) ( 2 1 F` 175 (log-cluster (cluster-entropy WCLASS)) (define log-dataset-stuff (make-merge-logger LLOBJ)) covr-obj nan in mmtq nan in top-pair mi (define *-log-anchor-* (covr-obj 'wild-wild)) (define (chop p) (define vl (cog-value *-log-anchor-* p)) (define vs (take (cog-value->list vl) 175)) (define vt (cog-type vl)) (cog-set-value! *-log-anchor-* p (cog-new-value vt vs)) *unspecified*) (define (show p) (cog-value *-log-anchor-* p)) (show (Predicate "mmt-q")) (show (Predicate "top-pair mi")) (show (Predicate "top-pair ranked-mi")) (show (Predicate "sparsity")) (show (Predicate "mmt-entropy")) (show (Predicate "left dim")) (show (Predicate "right dim")) (show (Predicate "left-count")) (show (Predicate "right-count")) (show (Predicate "total entries")) (show (Predicate "left-entropy")) (show (Predicate "right-entropy")) (show (Predicate "total-entropy")) (show (Predicate "num classes")) (chop (Predicate "mmt-q")) (chop (Predicate "top-pair mi")) (chop (Predicate "top-pair ranked-mi")) (chop (Predicate "sparsity")) (chop (Predicate "mmt-entropy")) (chop (Predicate "left dim")) (chop (Predicate "right dim")) (chop (Predicate "left-count")) (chop (Predicate "right-count")) (chop (Predicate "total entries")) (chop (Predicate "left-entropy")) (chop (Predicate "right-entropy")) (chop (Predicate "total-entropy")) (chop (Predicate "num classes")) ------ Round 175.0 Next in line: ranked-MI = 7.1101 MI = 6.6435 (`I’m`, `perhaps by where no after without with each from since these had during`) ranked-MI = 7.0304 MI = 6.0500 (`I’m Not It’s`, `Then`) ------ Start merge 175.0 with seed pair `I’m` and `perhaps by where no after wit hout with each from since these had during` In-group size=6: `perhaps by where no after without with each from since these h ad during` `I’m` `of` `are` `were` `others` ------ merge-majority: Merge 4327 of 11013 sections in 10 secs (WordNode "I’m") (WordNode "of") (define XROS (covr-obj 'get-pair (WordNode "of") (ShapeLink (WordNode "this") (Connector (VariableNode "$connector-word") (ConnectorDir "-")) (Connector (WordNode "vision") (ConnectorDir "+"))))) (define SIM-ID "shape-mi") (load "/home/ubuntu/src/learn/scm/common.scm") (load "/home/ubuntu/src/learn/scm/utilities.scm") (load "/home/ubuntu/src/learn/scm/gram-class/log-merge.scm") (load "/home/ubuntu/src/learn/scm/gram-class/shape-vec.scm") (load "/home/ubuntu/src/learn/scm/gram-class/shape-project.scm") (load "/home/ubuntu/src/learn/scm/gram-class/gram-majority.scm") (load "/home/ubuntu/src/learn/scm/gram-class/agglo-rank.scm") (define na (list-ref (cog-value->list (show (Predicate "left dim"))) 175)) (define wl (list (WordClass "perhaps by where no after without with each from since these had during") (Word "I’m") (Word "of") (Word "are") (Word "were") (Word "others"))) (define (ga LLOBJ WRD-LIST) (define dj-set (make-atom-set)) (define wrd-set (make-atom-set)) (for-each wrd-set WRD-LIST) (define (pair-margins PAIR) (wrd-set (LLOBJ 'left-element PAIR)) (dj-set (LLOBJ 'right-element PAIR))) (define (cross-margins PAIR) (for-each pair-margins (LLOBJ 'make-cross-sections PAIR))) (define (expand-margins PAIR) (format #t "yo ~A" PAIR) (dj-set (LLOBJ 'right-element PAIR)) (if (equal? 'Section (cog-type PAIR)) (cross-margins PAIR) (let ((sect (LLOBJ 'get-section PAIR))) (pair-margins sect) (cross-margins sect)))) (for-each (lambda (WRD) (for-each expand-margins (LLOBJ 'right-stars WRD))) WRD-LIST) (define affected-djs (dj-set #f)) (for-each (lambda (DJ) (for-each wrd-set (LLOBJ 'left-duals DJ))) affected-djs) (list (wrd-set #f) affected-djs) ) Throw to key `bad-summation' with args `(compute-total-entropy "Left and right entropy sums fail to be equal: 23.73641651069839 27.248554685599636\n")'. port 20212 (define wc (WordClass "perhaps by where no after without with each from since these had during I’m of are were others.i")) ((make-simmer covr-obj) wc wc) (log-class wclass) (define (show p) (cog-value *-log-anchor-* p)) (define (pln p) (length (cog-value->list (show p)))) (pln (Predicate "mmt-q")) (pln (Predicate "class")) (batch-all-pair-mi covr-obj) Finished with MI computations; this took 4.449 hours. (define wc (WordClassNode "perhaps by where no after without with each from since these had during I’m of are were others")) (smi 'mmt-marginal wc) ; 0.0 (define psu (add-support-compute covr-obj)) ==================================================== run-12 sanity checks. noise=3 ext Rows: 10031 Columns: 925134 == log_2 13.2922 x 19.8193 Size: 1937274.0 log_2 size: 20.8856 Fraction non-zero: 2.0876E-4 Sparsity: 12.2259 Rarity: 4.32986 Total obs: 22643824.0 Avg obs/pair: 11.6885 log_2 avg: 3.54702 Entropy Total: 18.0602 Left: 16.0128 Right: 8.39150 Total MI: 5.5480 Left Right Avg-left Avg-right ---- ----- -------- --------- Support (l_0) 163.4 3.1862E+4 Count (l_1) 5818. 3.1884E+5 35.61 10.01 Length (l_2) 917.6 8105. 5.616 .2544 RMS Count 835.1 7978. 5.111 .2504 MM^T support=87571358.0 count=167020867736.0 entropy=15.487 Now recomp from scratch: (define btr (batch-transpose star-obj)) (btr 'clobber) (btr 'mmt-marginals) (print-matrix-summary-report star-obj) Rows: 10031 Columns: 1447377 == log_2 13.2922 x 20.4650 Size: 3402228.0 log_2 size: 21.6980 Fraction non-zero: 2.3434E-4 Sparsity: 12.0591 Rarity: 4.81945 Total obs: 34051699.0 Avg obs/pair: 10.0086 log_2 avg: 3.32318 Entropy Total: 18.0602 Left: 16.0128 Right: 8.39150 Total MI: 5.5480 Left Right Avg-left Avg-right ---- ----- -------- --------- Support (l_0) 161.0 2.9285E+4 Count (l_1) 8541. 3.3810E+5 53.05 11.55 Length (l_2) 2433. 1.7301E+4 15.11 .5908 RMS Count 2233. 1.7197E+4 13.87 .5872 Crap. That is totally. completly different. WTF. r12-log-q0.7-c0.2-n3.dat # N,rows,cols,lcnt,rcnt,size 1 9495 1015850 22643824 22643824 2717117 1062 10031 925134 22643824 22643824 1937274 And so counts are completely violated. The report says detailed balance was preserved, but recalculating it broke everything. wtf. ------------- Fresh start: Rows: 9495 Columns: 1015850 == log_2 13.2130 x 19.9543 Size: 2717117.0 log_2 size: 21.3736 Fraction non-zero: 2.8170E-4 Sparsity: 11.7936 Rarity: 4.79004 Total obs: 22643824.0 Avg obs/pair: 8.33377 log_2 avg: 3.05897 Entropy Tot: 19.4210 Left: 16.5007 Right: 8.46831 MI: 5.54801 MM^T support=193557505.0 count=131743839840.0 entropy=18.117 Left Right Avg-left Avg-right ---- ----- -------- --------- Support (l_0) 163.4 3.1862E+4 Count (l_1) 5818. 3.1884E+5 35.61 10.01 Length (l_2) 917.6 8105. 5.616 .2544 RMS Count 835.1 7978. 5.111 .2504 ((PredicateNode . 26) (ListLink . 418856) (AnyNode . 7) (Connector . 17062) (ConnectorDir . 2) (ConnectorSeq . 204680) (Section . 833833) (ShapeLink . 811170) (CrossSection . 1883284) (VariableNode . 1) (EvaluationLink . 9496) (TypeNode . 6) (TypeChoice . 3) (AnchorNode . 1) (SchemaNode . 1) (RocksStorageNode . 1) (WordNode . 9495)) Above looks consistent with the summary report. (define btr (batch-transpose star-obj)) (btr 'clobber) (btr 'mmt-marginals) Looks good. (after recomput of marginals). Now do: (in-group-cluster covr-obj 0.7 0.2 3 200 1) Rows: 9496 Columns: 1018843 == log_2 13.2131 x 19.9585 Size: 2714611.0 log_2 size: 21.3723 Fraction non-zero: 2.8058E-4 Sparsity: 11.7993 Rarity: 4.78651 Total obs: 22643824.0 Avg obs/pair: 8.34146 log_2 avg: 3.06030 Entropy Tot: 19.4171 Left: 16.5051 Right: 8.47292 MI: 5.54801 MM^T support=191289125.0 count=131527914076.0 entropy=18.086 so total observations unchanged, other things changes sligtly, seems OK to me. ((PredicateNode . 55) (ListLink . 419610) (MemberLink . 6) (AnyNode . 7) (Connector . 17064) (ConnectorDir . 2) (ConnectorSeq . 206972) (Section . 833009) (ShapeLink . 811871) (CrossSection . 1881602) (VariableNode . 1) (EvaluationLink . 9497) (TypeNode . 6) (TypeChoice . 3) (AnchorNode . 1) (SimilarityLink . 20300) (SchemaNode . 1) (RocksStorageNode . 1) (WordNode . 9495) (WordClassNode . 1)) section+cross= 2714611 agrees w/ reported size. conseq+shape = 1018843 agrees w/ numcols (btr 'clobber) (btr 'mmt-marginals) No change. Cool. Exit, restart. still looks good. (cog-reoprt-counts) is same as above, except for: (ShapeLink . 831218) so 20K more shapes... (CrossSection . 1925192) more crosses. OK, code needs to delete not extract, yeah!? (define btr (batch-transpose star-obj)) (btr 'clobber) (btr 'mmt-marginals) Rows: 9496 Columns: 1038190 == log_2 13.2131 x 19.9856 Size: 2758201.0 log_2 size: 21.3953 Fraction non-zero: 2.7977E-4 Sparsity: 11.8034 Rarity: 4.79592 Total obs: 22784833.0 Avg obs/pair: 8.26076 log_2 avg: 3.04627 Entropy Tot: 19.4171 Left: 16.5051 Right: 8.47292 MI: 5.54801 MM^T support=194867655.0 count=132418291939.0 entropy=18.130 Whoops, there we go: 20K more columns, 44K more entries, 143K more observations. is ... delete not working? ((PredicateNode . 54) (ListLink . 419610) (MemberLink . 6) (AnyNode . 7) (Connector . 17064) (ConnectorDir . 2) (ConnectorSeq . 206972) (Section . 833009) (ShapeLink . 831218) (CrossSection . 1925192) (VariableNode . 1) (EvaluationLink . 9497) (TypeNode . 6) (TypeChoice . 3) (AnchorNode . 1) (SchemaNode . 1) (RocksStorageNode . 1) (WordNode . 9495) (WordClassNode . 1)) Section+CrossSection= 2758201 size is in agreement ConnectorSeq+Shape = 1038190 is in agreement (VariableNode "$connector-word") (cog-incoming-size (first (cog-incoming-set v))) but these are all Shapes. Cleanup is OK. So, after one merge, there are 1883284 CrossSections in the file. Crap. That's exactly how many we started with, too ... Are they even balanced? (define (check-bal SECT) (define xes (star-obj 'get-cross-sections SECT)) (define exc (cog-count SECT)) (for-each (lambda (XS) (when (not (> 1.0e-10 (abs (- (cog-count XS) exc)))) (format #t "unbalancedat ~A vs ~A at ~A\n" exc (cog-count XS) XS) (foobar))) xes) (define mes (covr-obj 'make-cross-sections SECT)) (when (not (equal? (length xes) (length mes))) (format #t "aiiee missing ~A ~A\n" (length xes) (length mes)) (foobar))) (for-each check-bal (cog-get-atoms 'Section)) Yeah, they are balanced... at least. (load-atoms-of-type 'CrossSection) (for-each cog-delete! (cog-get-atoms 'CrossSection)) ------------------------------------------------------- noise-3 intially: Rows: 10031 Columns: 925134 == log_2 13.2922 x 19.8193 Size: 1937274.0 log_2 size: 20.8856 Fraction non-zero: 2.0876E-4 Sparsity: 12.2259 Rarity: 4.32986 Total obs: 22643824.0 Avg obs/pair: 11.6885 log_2 avg: 3.54702 MM^T support=87571358.0 count=167020867736.0 entropy=15.487 but above is as things were, after the crash. How will it fare under MMT recompute? Cleanup removed 11 words... (define btr (batch-transpose star-obj)) (btr 'clobber) (btr 'mmt-marginals) Now: Rows: 10020 Columns: 925040 == log_2 13.2906 x 19.8192 Size: 1937115.0 log_2 size: 20.8855 Fraction non-zero: 2.0899E-4 Sparsity: 12.2243 Rarity: 4.33060 Total obs: 22638279.0 Avg obs/pair: 11.6866 log_2 avg: 3.54678 Entropy Tot: 18.0602 Left: 16.0128 Right: 8.39150 MI: 5.54801 MM^T support=87567779.0 count=167007017237.0 entropy=15.487 Huh. I bet if cleanup didn't run, nothing would have changed. 530 word-classes ---------------------- Noise-4: after decrossing: Rows: 10039 Columns: 874667 == log_2 13.2933 x 19.7384 Size: 1878624.0 log_2 size: 20.8412 Fraction non-zero: 2.1395E-4 Sparsity: 12.1905 Rarity: 4.32539 Total obs: 22643824.0 Avg obs/pair: 12.0534 log_2 avg: 3.59137 Entropy Tot: 17.9911 Left: 15.9154 Right: 8.28900 MI: 5.54801 MM^T support=101954590.0 count=187199529824.0 entropy=15.761 (ListLink . 333411) (MemberLink . 10503) (ConnectorSeq . 229612) (Section . 627362) (ShapeLink . 1279162) (CrossSection . 1440465) (WordClassNode . 1057)) after cleanup: (ListLink . 333338) (MemberLink . 10483) (ConnectorSeq . 229578) (Section . 627159) (ShapeLink . 698022) (CrossSection . 1440114) (WordClassNode . 536) -- wtf- really!? Rows: 10039 Columns: 874667 == log_2 13.2933 x 19.7384 Size: 1878624.0 log_2 size: 20.8412 Fraction non-zero: 2.1395E-4 Sparsity: 12.1905 Rarity: 4.32539 Total obs: 22643824.0 Avg obs/pair: 12.0534 log_2 avg: 3.59137 Entropy Tot: 17.9911 Left: 15.9154 Right: 8.28900 MI: 5.54801 MM^T support=101954590.0 count=187199529824.0 entropy=15.761 (WordClassNode . 1057) --- what? so cleanup did not actually remove these ... Hmm. Cause of .. memberlinks? deleted by (zap-word-marginals star-obj) ohhhh .. they are in the logs! But why did load fail? They are in values in the logs... The load fails, since apparently, atoms in values are not stored as atoms... esp if they are in a LinkValue ahh, no .. if they are deleted, the associated LinkValue is NOT deleted! Ah hah! and we cannot find these, by design, because we do not want to create an index on the Values. Ah hah. (define ewc (filter (lambda (WCL) (equal? 0 (cog-incoming-size-by-type WCL 'Member))) (cog-get-atoms 'WordClass))) (length ewc) $3 = 521 (car ewc) $5 = (WordClassNode "He It There She They") (cog-incoming-size (car ewc)) $6 = 0 (define btr (batch-transpose star-obj)) (btr 'clobber) the clobber changes (ListLink . 333338) into (ListLink . 343924) Huh. Wonder what that is about? Nothing else chages, except one more EvaluationLink .... (btr 'mmt-marginals) Rows: 10022 Columns: 927600 == log_2 13.2909 x 19.8231 Size: 2067273.0 log_2 size: 20.9793 Fraction non-zero: 2.2237E-4 Sparsity: 12.1347 Rarity: 4.42228 Total obs: 24424447.0 Avg obs/pair: 11.8148 log_2 avg: 3.56253 Entropy Tot: 17.9911 Left: 15.9154 Right: 8.28900 MI: 5.54801 MM^T support=113249095.0 count=211466542579.0 entropy=15.888 Well, that is substaintially different from before. But .. whatever. Plow ahead ==================================================== (define wclasses (cog-get-atoms 'WordClass)) (compute-diag-mi-sims star-obj wclasses 0 (length wclasses)) (define all-words (rank-words star-obj)) (compute-diag-mi-sims star-obj (take all-words 1100) 0 1100) ==================================================== noise=2 Entropy Tot: 17.9826 Left and right entropy sums fail to be equal: 20.173277930295733 17.98237494387546 So lsum seems too big ... lsum is sum over (frqobj 'left-wild-entropy x) where x is right-elt (define frqobj (add-pair-freq-api star-obj #:nothrow #t)) (define (left-sum FN) (fold (lambda (right-item sum) (+ sum (FN right-item))) 0 (star-obj 'right-basis))) (define lsum (left-sum (lambda (x) (frqobj 'left-wild-entropy x)))) clobber does not fix it. Search for bad ones..? (define (cache-left-entropy RIGHT-ITEM) (define ent (compute-left-entropy RIGHT-ITEM)) (define fent (/ ent (frqobj 'left-wild-freq RIGHT-ITEM))) (frqobj 'set-left-wild-entropy RIGHT-ITEM ent fent)) ------ This checks it: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< (define frqobj (add-pair-freq-api star-obj #:nothrow #t)) ; Compute the left-wild entropy summation: ; h_left(y) = -sum_x P(x,y) log_2 P(x,y) (define (compute-left-entropy RIGHT-ITEM) (fold (lambda (PAIR sum) (+ sum (frqobj 'pair-entropy PAIR))) 0 (star-obj 'left-stars RIGHT-ITEM))) (define nok 0) (define bad #f) (define (is-eq? a b) (> 1.0e-10 (abs (- a b)))) (define (check-left-entropy RIGHT-ITEM) (define ent (compute-left-entropy RIGHT-ITEM)) ; (define fent (/ ent (frqobj 'left-wild-freq RIGHT-ITEM))) (define ment (frqobj 'left-wild-entropy RIGHT-ITEM)) ; (define mfent (frqobj 'left-wild-fentropy RIGHT-ITEM)) (set! nok (+ 1 nok)) (when (not (is-eq? ent ment)) (set! bad RIGHT-ITEM) (format #t "Not equal ~D ~A was=~A for\n~A" nok ent ment RIGHT-ITEM) (foobar)) ) (for-each check-left-entropy (star-obj 'right-basis)) First one fails: Not equal 1 0 2.069669352634443e-6 for (ShapeLink (WordNode "James") (Connector (VariableNode "$connector-word") (ConnectorDir "-")) (Connector (WordNode "replied") (ConnectorDir "+"))) (define bstars (star-obj 'left-stars bad)) (length bstars) $5 = 0 So ... why was this not deleted? Start merge 890 with seed pair `April` and `February October March Decemb er January July August September May June an some these each June` In-group size=2: `February October March December January July August September May June an some these each June` `April` (define nost 0) (define nexm 0) (define (no-stars RIGHT-ITEM) (define lstars (star-obj 'left-stars RIGHT-ITEM)) (set! nexm (+ 1 nexm)) (when (equal? 0 (length lstars)) (set! nost (+ 1 nost)))) (for-each no-stars (star-obj 'right-basis)) (define nost 0) (define nexm 0) (define (no-stars LEFT-ITEM) (define rstars (star-obj 'right-stars LEFT-ITEM)) (set! nexm (+ 1 nexm)) (when (equal? 0 (length rstars)) (set! nost (+ 1 nost)))) (for-each no-stars (star-obj 'left-basis)) ---------------- Again: (for-each check-left-entropy (star-obj 'right-basis)) very first one fails ... its a shape... an some these each in one cross-section only... Not equal 1 3.963225621536661e-6 3.962690096800633e-6 for post-merge: Total obs: 22640554.0 vs Total obs: 22643824.0 << old (/ 4 (frqobj 'left-wild-freq bad)) << 22643824.0 so, unbalanced because the obs count changed recompute mmt marginals, then: Total obs: 22640554.0 so lets see if that holds .... in-group-cluster covr-obj 0.7 0.2 2 200 1) after one run: Total obs: 22640554.0 `(compute-total-entropy "Left and right e ntropy sums fail to be equal: 17.98027186892881 17.981837223401804\n")' (define nok 0) (define bad #f) (for-each check-left-entropy (star-obj 'right-basis)) Same fail as before: Not equal 1 3.963225621536661e-6 3.962690096800633e-6 for How is this possible? was it not recomputed? Its clearly in the right basis .... (btr 'mmt-marginals) -- again (frqobj 'left-wild-entropy bad) $28 = 3.962690096800633e-6 (compute-left-entropy bad) $29 = 3.963225621536661e-6 So ... recomputing mmt did not recompute the marginals! all-mmt-marginals (setup-supports) (batch-left-support) (batch-right-support) (scomp-obj 'all-right-marginals) (scomp-obj (add-support-compute star-obj)) ... in short, entropy is not recomputed. Great. So why do we need the entropy? (define enc (add-entropy-compute star-obj)) (enc 'cache-all-left-entropy) -- Loop over all columns. 'cache-all-right-entropy who calls compute-total-entropy ? No one! O dear. None of this was ever really needed. left-wild-freq cache-all-left-freqs Solution: (star-obj 'clobber) (define cfq (make-compute-freq star-obj)) (cfq 'init-freq) (cfq 'cache-all-left-freqs) (cfq 'cache-all-right-freqs) (define enc (add-entropy-compute star-obj)) (enc 'cache-all-left-entropy) (enc 'cache-all-right-entropy) (define btr (batch-transpose star-obj)) (btr 'mmt-marginals) ------------------ 878:16 8 (perform-merge _ _ _) 102:16 7 (compute-sim # #) ------ Start merge 1342 with seed pair `of` and `from` Initial in-group size=8: `from` `of` `and` `is` `by` `under` `upon` `through` In-group size=8 overlap = 0 of 91448 disjuncts, commonality= 0.00% In-group size=7 overlap = 0 of 90667 disjuncts, commonality= 0.00% In-group size=6 overlap = 0 of 89727 disjuncts, commonality= 0.00% In-group size=5 overlap = 0 of 88880 disjuncts, commonality= 0.00% In-group size=4 overlap = 0 of 84496 disjuncts, commonality= 0.00% In-group size=3 overlap = 0 of 73777 disjuncts, commonality= 0.00% In-group size=2 overlap = 0 of 39114 disjuncts, commonality= 0.00% In-group size=2: `from` `of` ------ merge-majority: Merge 0 of 6660 sections in 3 secs ------ merge-majority: Remaining 0 of 32454 cross in 25 secs OK, so this dataset is just ... broken beyond repair. I give up. And the next one: arg: Total obs: 22640554.0 (r12-i2)> (compute-left-entropy bad) $35 = 4.882936572523517e-6 (r12-i2)> (frqobj 'left-wild-entropy bad) $36 = 4.882277433514931e-6 How??? -------------------- Ongoing. ------ Merged into `] )` in 22717 secs Throw to key `bad-summation' with args `(compute-total-entropy "Left and right entropy sums fail to be equal: 7.244230580876298 19.30738867333117\n")'. Very first marginal fails. WTF. Yet it's OK in the starting object. (define j (cog-link 'ShapeLink (WordNode "James") (Connector (VariableNode "$connector-word") (ConnectorDir "-")) (Connector (WordNode "replied") (ConnectorDir "+")))) ... it exists ... (define bstars (star-obj 'left-stars bad)) (length bstars) as always. (frqobj 'left-wild-entropy j) 0.0 .. that's wrong. (compute-left-entropy j) ; gives 0 .. that's wrong too. Ohh. Pair entropies are missing! Because pair freqs never computed... (also: are left-wild-freq being updated? Yes, they are.) In-group size=6: `—` `_` `)` `[` `(` `]` ------ merge-majority: Merge 34 of 6834 sections in 6 secs ------ merge-majority: Remaining 96 of 12183 cross in 12 secs ------ Merged into `— _ ) [ ( ]` in 130 secs ------ Find affected basis of (9201, 51065) in 249 secs ------ Recomputed entropies in 424 secs ice-9/boot-9.scm:1669:16: In procedure raise-exception: Throw to key `bad-summation' with args `(compute-total-entropy "Left and right entropy sums fail to be equal: 9.150232146940487 19.412213341952143\n")'. Failure to store! * Need to store Shapes, these hold entropy marginals. * Argh.... Cross-Sections hold (PredicateNode "*-FrequencyKey cover-section") needed for entropy marginals ... but these are never stored. Conclude: it is impractical to track entropy, after all. ... unless the calculations are rejiggered to use raw count, instead of frequency. This seems like the only reasonable option. This requires expanding the compute-freq api... Is this worth it? Not now, it adds overhead, and the stats are not that interesting... * alter matrix code * alter agglo to not re-freq. data/th/.DS_Store empty data/th/words/words.adverbial.common data/th/words/words.any data/th/words/words.conj.common data/th/words/words.connector data/th/words/words.nominal.common data/th/words/words.prep-n.nv data/th/words/words.qf.common Prachya Boonkwan tar -cf len.tar --exclude learn-en/rootfs/data --one-file-system learn-en tar -cf - --exclude learn-en/rootfs/data --one-file-system learn-en | ssh backlot "cd /home/lxc/tmp; cat | tar xf -" cp -pr /home2/linas/src/novamente/data/rocks-archive/run-1-t12-tsup-1-1-1.rdb /data/lxc-databases/learn-en/rootfs/data guile: symbol lookup error: /home/atomspace/src/cogserver/build/opencog/cogserver/shell/libjson-shell.so: undefined symbol: _ZN7opencog8JsonEval13get_evaluatorEPNS_9AtomSpaceE stats: * uptime. NetworkServer::_port, cpu use, mem use. * list of connections, uptime of connection, last activity. * Number of processed requests per connection. get_use_count ServerConsole ConsoleSocket ServerSocket connection_stats ../cogserver/server/ServerConsole.h time_t _start_time; virtual std::string connection_header(void); virtual std::string connection_stats(void); handle_connectio // Most recent activity struct tm tm; gmtime_r(&_last_activity, &tm); char buff[20]; strftime(buff, 20, "%d %b %H:%M:%S", &tm); ServerSocket::display_stats ServerConsole* con = req->get_console(); oss << "Console max-open-sockets = " << ConsoleSocket::get_max_open_sockets() << "\n"; oss << "Console curr-open-sockets = " << con->get_num_open_sockets() << "\n"; // count open file descs int nfd = 0; for (int j=0; j<4096; j++) { int fd = dup(j); if (fd < 0) continue; close(fd); nfd++; } oss << "Process num-open-fds = " << nfd << "\n NetworkServer cogserver()._networkServer DATE TID RUN TYPE NREQ LAST 12 Mar 05:47:54 12345678 r scm 87654321 12 Mar 05:47:54 evalque.size() EQ _eval_done EV bool _pending_output.size() PEND bool eval_done() const { return _eval_done; } size_t pending() const { return _pending_output.size(); } size_t queued() const { return evalque.size(); } opencog> stats ----- 13 Mar 21:38:10 2022 UTC ---- up-since: 13 Mar 21:06:39 2022 status: running last: 13 Mar 21:37:51 tot-cnct: 203 port: 19014 max-open-socks: 10 cur-open-socks: 3 num-open-fds: 33 cpu: 535.614 secs user: 518.073 sys: 17.540 maxrss: 6859108 KB majflt: 34 inblk: 2149544 outblk: 132224 DATE THREAD STATE U SHEL QZ E PENDG NLINE LAST-ACTIVITY 13 Mar 21:07:14 21170 iwait 1 cogs 9 13 Mar 21:38:10 13 Mar 21:32:29 19362 iwait 0 cogs 1 13 Mar 21:32:33 13 Mar 21:37:51 7811 iwait 0 scm 0 F 0 2 13 Mar 21:38:03 setsockopt (SO_KEEPALIVE) inn NetworkServer.cc ShellUTest.cxxtest:75: Error: Expected (reso.size() <= 265+1), found (267 > 2 66) Thread 20 "cogserv:listen" received signal SIG32, Real-time event 32. [Switching to Thread 0x7fffdfc5c700 (LWP 884441)] terminate called after throwing an instance of 'opencog::SyntaxException' what(): Badly formed alist: (/home/linas/src/novamente/src/atomspace-git/ope ncog/persist/sexpr/ValueSexpr.cc:253) 13.6g 20m 43719 Columns: 43970 18544906.0 (check-gram-dataset cset-obj) (cleanup-gram-dataset cset-obj) ((PredicateNode . 19) (ListLink . 4113614) (AnyNode . 4) (Connector . 17060) (ConnectorDir . 2) (ConnectorSeq . 204680) (Section . 833833) (EvaluationLink . 3899438) (TypeNode . 1) (SchemaNode . 1) (RocksStorageNode . 1) (WordNode . 13206) (LgLinkNode . 1)) (WordNode . 10850) (WordNode . 13206) check-word-marginals (WordNode "Southwark") 9495, is left basis vs 13166 !??? (load-atoms-of-type 'Word) (for-each fetch-incoming-set (cog-get-atoms 'WordNode)) (for-each fetch-incoming-set (cog-get-atoms 'Connector)) (for-each fetch-incoming-set (cog-get-atoms 'ConnectorSeq)) (load-atomspace) (cog-report-counts) $28 = ((PredicateNode . 32) (ListLink . 4110833) (AnyNode . 7) (Connector . 17068) (ConnectorDir . 2) (ConnectorSeq . 204764) (Section . 833919) (ShapeLink . 811170) (VariableNode . 1) (EvaluationLink . 3701473) (TypeNode . 3) (SimilarityLink . 20100) (SchemaNode . 1) (RocksStorageNode . 1) (WordNode . 9497) (SentenceNode . 1) (ParseNode . 1) (LgLinkNode . 1)) After pairs: ((PredicateNode . 15) (ListLink . 3691978) (AnyNode . 2) (EvaluationLink . 3691978) (SchemaNode . 1) (RocksStorageNode . 1) (WordNode . 9280) (LgLinkNode . 1)) After sections ((PredicateNode . 18) (ListLink . 3906154) (AnyNode . 4) (Connector . 17060) (ConnectorDir . 2) (ConnectorSeq . 204680) (Section . 833833) (EvaluationLink . 3691978) (TypeNode . 1) (SchemaNode . 1) (RocksStorageNode . 1) (WordNode . 9496) (LgLinkNode . 1)) Q: why 204764 - 204680 = 84 conseqs? Q: why 833919 - 833833 = 86 sections? (define msec (cset-stars 'get-all-elts)) (length msec) ; 833833 (define is-msec? (make-aset-predicate msec)) (load-atoms-of-type 'Section) No change! good, but huh? OK .... (setup-initial-similarities star-obj 1000) (setup-initial-similarities star-obj 1500) She was thinking of ... J- She was thinking about ... At first, it seemed like ... (it was going to rain) At first, it seemed as if ... (it was going to rain) It became clear that ... ... like it would rain. LI+ done 11058 ....n: J-; done ....n Cs- or Ce-; but CV ??? like.p LI- & C+ & CV+ ....v: LI+; done ==================================================== ==================================================== FIXED: * Implement connector merging. ... half-done... DONE, I think, and unit tests too. * Alter LG to allow multiple word definitions w/o subscripts. Amir fixed this, pull req #1204 * After merging, and then using fetched-pairs on sections, some/all merged shapes (that have been deleted) get recreated! FIXED * Write unit test for the recent caching fix in the pattern matcher, its doable... (fix circa 26 march 2021) #2803 Done in #2812 * During merging, long class names are printed, is that correct? FIXED, it was wrong, because WordClass test was wrong * Bug: RockDB needs to be compacted on close. closing and reopening a DB shrinks disk use by 20x for some workloads. Also its chewing up vast amounts of RAM. Bugs #9 and #10 DONE. * Create a generic run-shells.sh DONE. * Change shape-vect to use ShapeLink, CrossSection. DONE * Fix link-generator to do full combinatorial. DONE #1175 * Pair counting insanely slow and leaks RAM. * learn.scm use include not load or export... DONE * Alter dict generator to automatically place period at end of sentence. DONE. Also, to automatically produce left-wall and right wall; these need to be present for dict-compare to work without hackery. DONE. * Enhance: revise atomspace start-cogserver to take args directly DONE. * Similarity object needs to do all-pairs with word-classes and words. 'get-all-elts in the stars object needs to look at several left types. Maybe TypeChoice?? DONE. * Move 'accumulate-count to stars object. Also define a default 'get-count and 'set-count on the stars object. DONE Reverted. It was a bad idea. * After merging, there will be words paired to the new disjunts. Similarities for those words need to be recomputed. Right now, they are not! (That explains the stability of the candidate list!) DONE early Oct 2021. * The fast-math code works, but returns the incorrect pairs when used with direct-sum. Scheme hackiness seems impossible, because info has been lost. So this requires somehow naming the clauses in the pattern engine, but this is tricky in two ways. First, IdenticalLinks aren't pruned, as they should be. Second, when the top clause is a choice, it's not obvioius that such pruning will recognize the attempt to name the clause. Ugh. Maybe punt on this, there's a better solution, I guess. DONE during coding spree, 3rd week Oct 2021. * compute fmi can be 10x faster by using the query engine as above. DONE. Tested and benchmarked, 3rd week Oct 2021 * Some top-ranked merges have very few (less than 5%) of shared disjuncts. These should not be done, or the in-group should be tightened. Some in-groups seem too large by one member. Perhaps ingroups should have a minumum of 20% disjuncts shared by all members!? DONE late Oct 2021. * Figure out why query patterns are not being cleared. DONE 19 Dec 2021 fold-api was making queries and not clearing them. * Some in-group cluster names get re-generated and reused. See line 159 of gram-majority.scm (define cls-name (string-join (map cog-name WLIST))) FIXED 21 Dec 2021 * W is not needed in reshape-merge DONE * Left-right counts are mis-balanced after running in-group clustering for a while. Need to check that counts aren't leaking. in the print-matrix-summary report. 3 days hard work, partly fixed, issue was that marginals were not being loaded. Wow. This took more than a week and a major redesign. DONE 28 Dec 2021 * recompute-mmt will be more efficient if we union all the right duals for all the words, instead of doing them one at a time. DONE 29 Dec 2021 * Use increment-count for everything. (cog-inc-count! ATOM CNT) and (add-stars-api 'move-count) The move-count is done already!? DONE, I guess. * print-matrix-summary-report after MM^T shows nothing. Why? because 'mmt-marginals computes only word-pair entropies. (NOT A BUG) * Fix count transfer in add-singleton-classes (OBSOLETE) * Cleanup marginals after clustering ... marginals might have zero counts for words, or words might not be in the basis? which means the connector seqs need to be cleaned up too. (as they are now unused/empty) they were screwing up counts! DONE * mmt-q is handled incorrectly. It must not be stored with the similarities; instead, it should be added only at the last stage. Or better yet, left to be zero, and only added during debug printing. Actually, this is not a bug. its a confusion. so FIXED. * Expt-11 -- optimal-in-group -- change to use plain MI. DONE * When ConnectorSeq's no longer occur in any sections, they still have marginals with non-zero counts, and are thus never removed. (PredicateNode "*-Norm Key-*") .. so support-obj I don't get it. I looked at the code, and this should be happening already. Maybe this was a hang-over from before? DONE. It was a bug. * Add rarity and log-sizes to summary report. DONE TODO/BUGS: * Log the similarity fractions * Finish work in link-grammar bug #1276 -sqlite sanity checks. * Alter dict export to not require creation of singletons. They are way too slow and not really a needed step. * When WordClassNodes are fully merged, they are not deleted. * Setting MRG-CON to #f fails in at least half-a-dozen places. * Bugs: quotes are not being escaped by submit-one.pl!! ... and worse, these are showing up as words with lots of backslashes in them. See issue #31. Ending quotes also hide punctuation so punct is not split. * Set up direct-sum/shape clustering run scripts. (mostly done ?) * cogserver is not flushing output on 2nd and later connections .. wtf! or just behaving incoherently/inconsistently with netcat. echo -e "(for-each (lambda (n) (sleep 1) (format #t \"ping ~A\" n)(newline)) (iota 10 1))\n.\n." | nc localhost 17001 * GC during pair counting happens too often. (that is, gc takes up too large a fraction of CPU.) * Pair counting disappointingly slow.