Is there a general characteristic of simple programs that are able to learn complex behaviors, such as neural network or RL-based algorithms that can be implemented in less than 100-250 lines?

I don't know yet.

This question has come after thinking about DNA as being the code of human beings. DNA is also part of other animals, even viruses. Organisms use nucleotides to store the programs that are necessary to their existence. DNA is used to produce proteins within the body that accomplish various functions such as regulating our body, controlling our mood, our attention, our hunger, etc.

This code has evolved since we were non-biological. From a large amount of randomness (chemical elements), nucleotides were formed, which then somehow led to the formation of DNA itself after a likely long process. If through randomness we moved from a chaotic world to one with order and structure, and where a chain of DNA could finally emerge, it would be interesting to investigate the process in further details to determine whether it could give us clues regarding the process of creating a program that could evolve the same way DNA did.

Cellular automaton are also interesting to study in that aspect. By defining a small set of rules, it is possible to generate and observe complex behaviors.

One common behavior of cells is that they reproduce. As such, I would expect a program that can learn complex behaviors to have some reproductive function. Reproduction is considered as one of the traits of an entity being alive. My idea here is that exploring how we were able to massively populate the Earth may provide us with ideas on how a bit of code learned to lengthen itself, by the same process increasing the size of its host as well as the complexity and variety of cells that compose it.

Will an AGI be superior to a large group of individuals (e.g., society or a company)?

Most likely.

An AGI may be a strong single-minded entity. Unlike societies and companies that are composed of numerous individuals with different values/beliefs/opinions (VBO), an AGI is expected to have a single set of clear, concise and non-contradicting VBO. An AGI should be able to explore all potential alternatives and reason about all the potential sets of VBO in order to determine the most coherent and appropriate set to hold.

Meanwhile, we as individuals hold VBO that are often inconsistent. As a group, we are heterogenous in our VBO which means that conflict will arise since some sets of VBO cannot coexist. Our biggest issue is that we are competitive by nature. People fight over resources if they are limited. Fighting leads to winners and losers. The winners may not be necessarily the individuals with the "best" set of VBO. The fact that the "fittest" VBO may end up as the winner instead of the "best" set of VBO sounds unlikely to lead us to produce the optimal solution to a desired goal.

(This assumes that there is a single "best" VBO set and not various VBOs sets in the heterarchy of VBOs.)

07 Jan 2020

Organizing unread content

History / Edit / PDF / EPUB / BIB / 3 min read (~514 words)
Questions Task management

How can I organize all the webpages I never read?

  1. Delete articles you know you will never read
  2. Add articles you'd like to read one day in a system such as pocket
  3. Track when articles are added to your "must-read" list, after 1 year "graduate" them by deleting them from the list
  4. Estimate how long an article takes to read and how much value you expect it will bring you
  5. Use the ROI to order your reading list

The following method applies to content online as well as offline (magazine article, books).

The first, most important, and difficult strategy, is to simply let go of those articles. Most of the time, we keep certain things out of fear of missing out. We may also think that if at some point we have free time we'll go through them, however that never happens. We always try to find something new instead. This can be seen as a way for the mind to communicate that it doesn't think it would be worthwhile to spend its time reading this content so it's better not to and instead we should look for alternative content to read.

Once you've gotten rid of all those articles you decided you would never read you can add them to tracking systems such as pocket. The idea here is that it may be possible for you to read this content, but in other contexts than when you're in front of your computer. Maybe you'd be likelier to read the article if you're waiting in line or waiting for your bus/subway. Maybe you'd read it if you're on your way to work.

Track when you add articles to your list. The older an article becomes, the less likely you will be to read it. As articles reach a certain age, it might be time to graduate them to the graveyard, in other words, to never read them. Mark them as read or remove them from your reading list.

You should have an idea of how long an article takes to read. Pocket offers an estimate of how long it takes to read an article. Knowing how long it takes to read an article is important since one of the techniques to get rid of articles is to go through all the short articles first since the time investment may be low.

As in the case of task management, the strategy we will want to adopt to organize and prioritize our reading will be related to the return on investment (ROI) metric. Each article should have an estimate of the value it will bring you to read it (e.g., how much you would have paid to acquire this knowledge), as well as an estimate of the effort (duration, e.g., how long it takes to read the article according to pocket) necessary to go through it. You can then order your reading list from articles with the highest ROI to the least ROI and feel more confident that you are reading high (expected) value content.

How can I easily identify the next book I should read when I have over 500 to choose from?

  • Pick highly read books
  • Determine why you want to read, this will filter out many books from your list
  • Track how long you've wanted to read a book, you're more likely to read ones you recently added to your list
  • Fiction: pick based on your tastes of the moment
  • Non-fiction/technical: select a few books on the same topic, skim through them, then pick the best

My first heuristic when deciding which book to read is to consider how many people have already read it by using a site such as goodreads. The reason is that I want to read books that I may be able to discuss with others who will also have read the book. Reading niche books might be interesting, but it makes discussing them a lot more difficult.

For fiction books, I read books from a collection I've enjoyed at least one book. You could basically consider it book "social" proofing. For new books that are not part of a collection and from authors I've never read, I mostly decide based on my interests of the moment.

For non-fiction/technical books, I skim through a few books on the same topic and determine which book I feel the most confident will provide me with the most information presented in the most appropriate and succinct way.

Determining why you want to read will help you figure out what the most appropriate next book might be. You may want to relax and thus reading a technical book may not be appropriate while a fiction book would be. You may want to learn a new programming language and thus reading about a programming language you already know will not achieve that goal.

I suggest using a book tracker like goodreads as it will allow you to track when you added a book to your list. This will let you know how long the book has been sitting in your reading list, waiting to be read. Generally, the longer a book stays "shelved" the less likely it is you will ever read it. This generalizes to stating that most books are read in a LIFO (last in, first out) fashion.

How would you build an AI that could offer coaching for games like StarCraft or Dota/LoL?

I see coaching as similar to the loss function of a machine learning model. I also see coaching as trying to optimize a (program's) function by figuring out where the largest improvements can be made. In order to provide effective and useful feedback, a coach should focus on the areas where the player shows the most potential for improvement. In a game like Starcraft, that would mean first pointing out the macro level mistakes then the micro level mistakes.

To coach you need to have a model of which actions have an impact on the game and how much impact they have. A learning algorithm/model like AlphaZero generally exhibits two types of learning: learning from observation and learning by playing against itself.

The most common approach to coaching is by observing more successful players than themselves. The learners may watch better players while the better players are playing the game and commenting on their gameplay or the learners may watch someone reviewing a replay and providing their own analysis. Both of these cases can be seen as models (the players) trying to explain their internals (the logic behind their actions).

Coaching generally starts by trying to reproduce the recipe of someone else. You may not understand why they are doing certain things, but you do it yourself and you observe the results. As you practice repeating those same observations => actions, you try to reproduce as closely as possible what the better player would do.

In the first learning phase, the model simply observes what happens during gameplay. In competitive games such as MOBA/RTS, the only reward signal is the victory/loss at the end of a game. As human beings, we quickly learn that winning a fight/encounter is good and losing it is bad. Those give use intermediate reward signals that an AI agent may not be able to build right away since it is conceptually difficult to determine when an encounter begins and ends. The agent could however learn a simple metric such as the sum of the health of all units, where keeping this value high is generally a good thing.

The model will need at some point to be able to establish its own scoring system so it can give itself some intermediate rewards during a game. It will also need to learn how to segment a sequence of actions into repeatable action units such as constructing unit X, attacking player Y, defending zone Z. As such, it may deem that constructing unit X is worth 5 units of reward, attacking player Y is worth no reward and that defending zone Z is worth 30 points of reward. The value of rewards may vary based on numerous factors, such as how much time has elapsed since the beginning of the game, the known enemy army composition, existing vision, etc.

Having actions such as "attack coordinate X, Y" are a too low level. Your model will have to learn hierarchically complex actions such as "attack player X", "attack the gatherers of player X", "attack the weak gatherers of player X", etc. which will then translate down the hierarchy to "attack unit at coordinate X, Y".

An AI coach may look at hundreds or thousands of replays and observe the distribution of units allocation after 1, 2, 3, 5, 10, 15, 20 minutes (or every 5 seconds) into the game and their correlation to whether the player won or lost. It may look at the items purchased by the player in a MOBA game, their timing and their correlation to whether the player won or lost. For a human being to do similar thing would require a lot of time. Most would probably write scripts to automate the process of collecting those details instead of manually going through the replays one by one.

Playing against yourself is more complicated. A perfect recording of your actions may not prove difficult to beat. It may send units to the wrong location on the map, be caught off guard moving to a location while you positioned units in the middle of the path, it may react to an attack the "replay" opponent had sent to its base at one point in the game, etc. It is however a start, one example you can train against.

A lot of players who are invested in the game will do theorycrafting which is basically to use logic and reasoning in order to assess what to do in specific situations. They simulate various potential cases in their head and they devise plans to defeat them. While a human being may be able to devise a few dozen simulations over an hour, a computer may be able to generate hundreds of thousands. It may also be able to test them in a more accurate simulation environment. When game patches are released, it could rerun all the simulations it had generated to determine the impact of the patch on its existing strategies.

An AI coach can be provided the game rules, specifically, which units are weak/strong against other units, and look at the game while you are playing. If you attack your opponent and the AI observes a strong concentration of a specific type of unit, and it notices you do not have any of the units that counter this unit type, it may suggest that you start building those as soon as possible. It may also notice that your unit composition is weak against the unit composition of your enemy and suggest units to build to balance your army and to be better prepare for the next encounter.

We can see this act of theorycrafting as the equivalent of knowing, at a high level, the strategies and counter-strategies one can employ at an early point in the game, the same way you can learn the different opening moves in chess.

In the case of learning by playing against yourself, what we want the AI coach to provide us is an opponent that will challenge our current biggest weaknesses so we can address them. In many cases certain specialized strategies will be extremely strong against a specific type of strategy and we will want to know those cases so we can use those strategies when the time is right.

  • Determine your weaknesses/areas of improvement
  • Suggest potential approaches to solve recurrent problems we have
  • Suggest heuristics that may be easy to understand and follow as human beings
  • Simulate opponents that would exert your current weaknesses so you can practice against them
  • Collect various gameplay related statistics their associated success rate (number of units of type X after Y minutes, number of creeps killed after X minutes, items purchase order, build order, etc.)

  • If you were in an environment where you had access to very few replays, how would you learn the most out of those available?