MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Agentic AI and the Arabic Language: Bridging Innovation and Linguistic Heritage // Hassan Sawaf // Agent Hour

Posted Feb 19, 2025 | Views 6
# Arabic
# Agents
# Linguistics
Share
speaker
avatar
Hassan Sawaf
CEO & Founder @ aiXplain

Hassan Sawaf has 30+ years of experience in employing cutting-edge AI technologies for mission-critical business operations. He has founded several machine learning organizations in small and large tech companies that are leaders in their respective market segments today. The teams he started and managed are in Facebook/Meta, AWS/Amazon, eBay, SAIC/Leidos, and AppTek.

+ Read More
SUMMARY

Artificial Intelligence is transforming the way we interact with technology, and Agentic AI—systems that exhibit autonomy, adaptability, and decision-making capabilities—is at the forefront of this revolution. But what does this mean for the Arabic language, one of the richest and most complex languages in the world? As we advance AI-driven agents, ensuring they understand, process, and generate Arabic with the same fluency and nuance as English or other dominant languages is not just a technological challenge but a cultural imperative. In this speech, we will explore how Agentic AI can empower Arabic speakers, enhance accessibility, and preserve the linguistic heritage of over 400 million people while driving innovation across industries. The future of AI is agentic. The future of Arabic in AI depends on how we shape it today. Artificial Intelligence is transforming the way we interact with technology, and Agentic AI—systems that exhibit autonomy, adaptability, and decision-making capabilities—is at the forefront of this revolution. But what does this mean for the Arabic language, one of the richest and most complex languages in the world? As we advance AI-driven agents, ensuring they understand, process, and generate Arabic with the same fluency and nuance as English or other dominant languages is not just a technological challenge but a cultural imperative. In this speech, we will explore how Agentic AI can empower Arabic speakers, enhance accessibility, and preserve the linguistic heritage of over 400 million people while driving innovation across industries. The future of AI is agentic. The future of Arabic in AI depends on how we shape it today.

+ Read More
TRANSCRIPT

Hassan Sawaf [00:00:03]: All right, let's do it. All right, I have, yeah, as I said, couple, couple slides, a few slides prepared. So I just want to tell you that I am in AI for quite, quite a bit of time. So my first language model I built is in 95, 96. My first AI agent I have built with my team was when I was at ebay back in 2015. And the agents which we built over there are basically according to the architectures which we today talk about. So AI agent building is not that, not that new. And the cool thing is AI agents enable us to go from idea to solutions in minutes, according to most of us.

Hassan Sawaf [00:00:48]: Right. But we need to put a question mark about this one a little bit and I'll tell you why. So AI agent building, I mean you have an idea and you want to build a solution and theoretically you expect it to just be building the business logic and go from there. And in 2015 when I was building the first agent of ours, you see here on the bottom actually a link to when we were giving that to the market, it took us 25 man month and we have a customer on the platform which basically build a similar solution within one man month. So it's only 4 to 5% of the effort. Funny story is within an hour they can actually feel what is it going to feel like this shop bot to be building that shop bot. I mean that's actually kind of cool. But important is when you do just the business logic, you're going to end up with a demo, not with the actual solution which you're going to go out.

Hassan Sawaf [00:01:51]: So at ebay we had more than 400 man months implement used so that we have the pilot. So it's building the business logic was only 6% of the effort. The question is, where do the other 400 man months go? Right. Reality is that development of the business logic is only part of a journey from idea to solution. We have other things like data organization, model selection, trust, safety, security, deployment optimization, monitoring and scaling. Those are things which need to be worried about as well and they take time as well. So basically developing the business logic is only 10 to 20% of the effort of building the product itself. So this is why it's no wonder that some bigger businesses are saying that they can only slowly adopt gen AI.

Hassan Sawaf [00:02:45]: It's taking them sometimes 18 to 36 months to adopt and they need something there. So we basically build some agents to help you on the journey of building that kind of capability. And those technologies are actually helping you from the ideation all the Way to solution building and help you with this. And you can utilize that on explain some benefits. I mean we are allowing our users to build soloist agents which are LLMs and you basically connect some, how do you say that, some tools to it. I mean this community knows about what agents are. I mean in 2022 we had orchestrator agents where you basically are able to utilize multiple agents in orchestration to solve bigger problems. Strategist agents where you basically have planning and plan execution.

Hassan Sawaf [00:03:47]: Guardian agents where you basically allow policing agents to help your agent doing the job right. Like what we call inspector and inspector and bodyguard, which help you basically do your job right. And then now this year we have an evolver agent where through mutations we are able to like have the agent evolve and get better without supervision. I mean simple and I mean quickly going through the platform. So I mean if you build an agent, you just give it a name and a description on our platform and other platforms similarly. And for us we don't need to like identify what is the LLM behind it, what are the tools behind it. We'll figure it out or the platform figures it out for you. So you just basically run the agents, you can develop the tools, you can describe the policies you want to abide by.

Hassan Sawaf [00:04:48]: One of the important things is when you basically look at all this, it's all very language centric, right? Language is a critical component on how you are developing these agents. Agentic AI and Arabic language is something which is close to my heart. I mean being bilingual myself. So German and Arabic are my native languages. I looked into that and looked at basically the community of like 400,000, 400 million people who speak Arabic and analyzed it a little bit. So it is clear that during the agentic AI revolution, AI is no longer just a tool, it's evolving into agentic intelligence where AI systems demonstrate autonomy, adaptability and decision making capabilities. This shift is also also transforming how we interact with the technology, enabling AI driven agents to learn reason and collaborate. And then the question is, while AI is accelerating, is AI truly global? Is it equitable? Does it empower every language? And Arabic is the side where I'm coming from.

Hassan Sawaf [00:06:06]: And Arabic is one of the richest, most complex languages in the world, spoken by 400 million people, but it's underrepresented in AI. So if you basically are utilizing any LLM and you basically expect Arabic output and you compare it with the English output, the quality of the output is much lesser in terms of like richness vis a vis the English and other dominant languages. And this is not only a technological gap, culture gap and economic culture and economic imperative is there. So if we want to actually, actually have the Arabic speakers fully participate in the revolution, we have to have agentic AI understand, process and generate Arabic in a higher quality and without like investment in that field, whether it's Arabic or other languages which are not predominantly in the LLMs, we will see a digital divide. The people who speak English well and the people who are utilizing English as a as their language are going to see one result and everyone who is not being represented as much is going to see a different result. So I see that for the community of course to be an opportunity, we need to build basically conversational AI agents to be strong in AI, we need to educate, education, accessibility. AI needs to break the barriers for Arabic speakers and then business and e commerce agents should actually unlock new markets for Arabic speaking consumers. And important is also not only that we are serving the users, but also AI is a critical component to preserve culture.

Hassan Sawaf [00:08:04]: When you look at dialects, poetry and linguistic heritage, etc. And again I'm using Arabic here as an example, you can take the exact same arguments with many other languages which are spoken in the world. We speak 6,000 languages in the world and we are basically focusing only on, I don't know, between 5 and 16 languages to be strong. If we're talking about LLM building for example. So at explain, we want to make these languages first class citizens in AI. And this is we want to provide state of the art development environment so that we can develop. I described it a moment ago. And what makes us different is that we have a marketplace where we have many, Many, many models, 40,000 different models on the platform, some of them are pre trained in Arabic.

Hassan Sawaf [00:09:02]: We have 150 languages covered in the marketplace which can be customized to your use case. You can find data on the platform, you can model, fine tune it according to your needs. It's a collaborative marketplace. So it's not only explain which is there, it's a community, it's an ecosystem. So I'd like to invite everyone over to collaboratively tackle this. And for Arabic and non Arabic languages, I mean all languages, it's not only English and Arabic and German and French, there's plenty other languages. So to close down here, the future of AI is agentic. I'm not telling anything new to this community here.

Hassan Sawaf [00:09:46]: Everyone knows, I mean that's probably a component which is relevant to us and for us Arabic speakers, the future of Arabic in AI depends on the innovations which are in this room and that's true for Arabic and non Arabic, as I said. And whether you are a startup, a researcher, government entity or enterprise, we do invite you to build the next generation of Arabic AI agents on explain or again, one of the 150 other languages which we want to cover and support with. And this way we are not only adopting AI, but we also are shaping the future of AI in particular because we have a collaborative framework. We are strong believers in open science, strong believers in open source and whatnot. And the AI is going to grow with all of us growing. We need to make sure that it is intelligent, capable and influential in other languages than English, German, French and Spanish, etc. Etc. So that's basically in brief what I wanted to share with everyone.

Hassan Sawaf [00:11:06]: So, yeah, so this is us.

Demetrios [00:11:10]: Excellent. I really like that. I think it's super cool to look at my first question while everybody else feel free to go off of mute and jump in and ask a question. I've heard that with like Arabic there's something that you do in training because it's a language that you read right to left and not left to right. The training is right to left and left to right. Do you know anything about that?

Hassan Sawaf [00:11:45]: Oh yeah. So in essence, the way we are doing train all these models, we make them time synchronous, which means it doesn't matter whether it's left to right or right to left or top to bottom. I mean, some languages are not even written left to right or right to left. Right there, like Chinese, you can write top down. So, so basically. So if you basically take the time. Time, how do you call that? Use basically the time as something you align your conversation or like your content with. This is probably the best way we used to, I mean in the 90s, in the 80s, we used to actually look at how it's visually looking.

Hassan Sawaf [00:12:30]: And then we used to have those problems which you are actually touching upon. But that changed in the late 90s, I think where we basically looked at it doesn't really matter, it's basically a matter of time. So your scale is basically time here and basically the characters there. Whether they're written right to left or left to right is a matter of just visual, so to say.

Demetrios [00:12:58]: And I'm assuming that folks need to have a strong grasp of the language to be able to contribute at all.

Hassan Sawaf [00:13:10]: Yeah, I think it's good and helpful to have good grasp of the language, of course, because at the end of the day you're teaching your agent or LLM or whatever it is something and if you Basically are aware of cultural nuances and so forth. It's going to be helpful for the application you're building at the end of the day, so definitely helpful. But at the same time we're breaking down barriers. The world is, while it's growing, it's shrinking too. I mean we have technology to be able to communicate beyond the borders. Right. I mean if you're, I mean everyone uses Google Translate, everyone can actually build up to a certain degree the agents they're building to be language agnostic or language comprehensive more than language agnostic. And.

Hassan Sawaf [00:14:13]: But we need to push more. I mean today when you ask the LLM certain things which are very specific to some countries, you're going to have a result. Coming back from the LLM where you're saying you are not understanding my culture, you're basically answering something which is off. I mean it doesn't, this is inappropriate almost, I mean sometimes, right? For some, for some languages more and for other languages less. And the more difference there is between, I don't know, like let's say Meta's first version of Llama, right. It was basically trained almost only on English. Right. And you would ask questions and the results might come back and then they're not completely off.

Hassan Sawaf [00:14:58]: With the Llama 3 version, with the latest, latest things, they paid attention to this and re retrained the system to be much stronger. Giving the results back you really expect. I mean it's not 100%. None of the language models are 100% in any language actually. But I mean it's much better today than it used to be with llama and llama 2 and whatnot.

Demetrios [00:15:24]: Yeah. Taking into account that nuance is huge.

Hassan Sawaf [00:15:27]: Aditya.

Demetrios [00:15:28]: I see you got your hand up. Jump on man. What you gotta ask?

Hassan Sawaf [00:15:34]: Of course.

Aditya K [00:15:34]: Thanks Demetrios. Hey Hassan. Nice to meet you. My name is Aditya, I go by Adi. And I was very inspired by what you just shared with us because I'm also a fellow language enthusiast. I love learning different languages from different cultures. And one thing that really makes me sad sometimes is that my native tongue is Telugu and it's a South Indian language. And a lot of Indians, Indian Americans who are in the software engineering field speak this language.

Aditya K [00:16:12]: And a lot of them are very kind of, let's just say experienced with it and all. However, what saddens me is that because of modernization a lot of the ancient culture and cultural tradition and wisdom that you know, got past family to family, generation to generation has been lost in the modern Internet age because of globalization. Everybody Trying to aspire one goal, one type of lifestyle. One thing I will say that kind of warmed my heart a lot is when I introduced my grandma, who speaks only Telugu, to ChatGPT and she actually started using it and speaking to the tool in Telugu and she would use like the camera feature to scan like an English newspaper and she would ask it to translate, you know, whatever it is into Telugu and read that. And the fact that she was using it without me being there to help her out, that really warmed my heart. So I really relate to you on how passing on that wisdom through AI is very important. But to my question, you said that in order to have a democratic approach on making sure every single language, every single culture gets onboarded onto AI and they hop onto this revolution, we need like a lot more involvement. So how can people from different parts of the world whose language right now is not really that active within the gen AI space, how can they ensure that their language gets onboarded onto AI per se? Is it more people that we need? Is it more engineers? Like, what are the steps?

Hassan Sawaf [00:18:03]: That's a great, great question actually. So first of all, your story about your grandma, I can fully relate to. I love those stories actually. That's something. Thirty years ago, when I started working in AI was only a dream for us. I mean that basically, literally my grandma can do something with AI. That's really awesome. To your question, there's multiple involvements on how we can actually do.

Hassan Sawaf [00:18:34]: I mean, building apps which are paying attention to language is one thing. I mean, at the end of the day, if you basically building, you build your apps and you basically your agents and you're basically utilize language and hopefully you're able to collect some of the data either. I mean, of course you have to pay attention to privacy considerations and whatnot. But if you can collect some data which can actually be then fed back into your own agent, that would actually already start building foundations for other people to build on top of. Imagine you're building an agent and your agent can be hired by another company to build bigger solutions with that and so forth. At the end of the day, you are basically helping not only your own app, your own agent, but also other things. Another approach is. Another thing is.

Hassan Sawaf [00:19:34]: And they're running in parallel, should be running in parallel, not individually. Right. Another thing is if someone builds for certain languages, certain very awesome tools, chances are, I mean, let's say, for example, I don't know, named entity recognition or sentiment analyzer or something like that, and you basically have it make it available for the world and make it allowable to be used by LLM developers. For example, if you're doing an awesome job, you're solving your problem and you can actually make them utilize it for generating data for themselves, solving their problem. That's number two. So building awesome tools so that it actually so agents to develop basically use cases to feed back into your own agent. Whoops, sorry about that. And then basically the second thing is building core tools which can be utilized for data annotation and then data itself.

Hassan Sawaf [00:20:42]: I mean if you have data sets which you are willing to share for some people to license. I mean in our. I mean this is where I started actually explain as a marketplace. In its heart there is a marketplace because I believe that people should be able to contribute with data, but not only give data for free, but also allow monetization. With that I have built data for myself. I want to put it on a place where everyone can utilize it, but pays me a certain amount of money. Then I feel even comfortable sharing that data vis a vis someone utilizing my data without me knowing, etc. Etc.

Hassan Sawaf [00:21:22]: And then all the discussions come, who owns what kind of knowledge, who owns what kind of ip etc. So this is the third thing. I mean you can actually utilize our marketplace to help. I mean so we have, we have some companies actually licensing data through the marketplace. And some data is small, some data is large. Depends on what kind of data it is big. LLM developers are interested in any kind of data they can actually take in if it's diverse. And if we're looking at language, the language diversity is definitely core of their interest.

Hassan Sawaf [00:22:00]: I mean there is no question there. So those are three. There's probably 100 more, but this is three which come to my mind immediately.

Aditya K [00:22:10]: For sure. That definitely makes sense. I guess a little example was that at my workplace we created a voice audio recording of us discussing a certain technology within the company and fed that the transcription of that into a rag tool to create a small like SLM around that. So maybe something like that within different languages could also be a potential.

Hassan Sawaf [00:22:40]: Definitely. And if you make that available through a marketplace for everyone to utilize, you compound the value at the end of the day. For everyone. Yeah, definitely.

+ Read More
Sign in or Join the community

Create an account

Change email
e.g. https://www.linkedin.com/in/xxx or https://xx.linkedin.com/in/xxx
I agree to MLOps Community’s Code of Conduct and Privacy Policy.

Watch More

Navigating the AI Frontier: The Power of Synthetic Data and Agent Evaluations in LLM Development
Posted Jun 18, 2024 | Views 536
# AI Frontier
# Synthetic Data
# Evaluations
# LLMs
# Okareo.com
How to Systematically Test and Evaluate Your LLMs Apps
Posted Oct 18, 2024 | Views 15K
# LLMs
# Engineering best practices
# Comet ML
Small Data, Big Impact: The Story Behind DuckDB
Posted Jan 09, 2024 | Views 13.3K
# Data Management
# MotherDuck
# DuckDB