ChatGPT, which was released at the end of 2022, showed the world that AI can explain things "fluently, like a human." However, as the saying goes, "words are contrary..." words, tone of voice, and facial expressions do not always match. Going forward, a breakthrough in this issue will be necessary in human-AI communication.
Humans use three modalities (Verbal, Vocal, Visual), but AI only has Verbal. Prompt engineering is important to enable single-modal AI to accurately understand human intentions.
For AI to surpass human capabilities in output, it would not be fair unless the input is at the same level. Since GPT4.0, multimodal generative AI that inputs 3V has become mainstream.
AI can read and summarize vast amounts of illustrated material in an instant, spot absurdities in images, and even understand humor. Its emotional recognition has also evolved, and it can recognize emotions such as "joy," "surprise," and "sadness" by inputting the 3Vs of characters in a TV drama. AI is expected to play an active role in a wide range of fields.
Even if AI can take the same input as humans and produce the same output, its thought processes are completely different. Various methods, such as GANs and diffusion models, are being put to practical use, and the day will come when AI will surpass humans.
Report format: PDF (7.6MB)
Original data: PowerPoint, 78 slides, A4 size


