Vetro Source Lab.

← Back to the index

Research note 07

Why does AI identify the same Italian business differently?

Answer instability is not just random wording change. In Italian business queries, repeated runs can reveal unstable source paths, weak category anchors and identity substitutions that appear only when the same prompt is recorded more than once.

Recorded by Ehsaneddin Asgari March 27, 2026

A generated answer can be correct in one saved run and wrong in the next, without any public fact changing. The useful signal is the pattern of movement: which part of identity keeps slipping.

In a composite run set, Vetro Source Lab asks several systems a plain question about the same Italian design and home retail company. One answer calls it a Milan retailer. Another calls it an Italian furniture manufacturer. A third gives a careful paragraph on design shopping, then cites a profile that belongs closer to a reseller than to the company itself. The name survives. The identity does not.

The strange part is how reasonable each answer sounds in isolation. None reads like nonsense. Each has a source-shaped shadow behind it: an owned page, an English commerce profile, an old directory entry, a reseller mention. Only after the lab places the runs side by side does the movement become visible. The business is being reconstructed differently each time.

Instability is a pattern, not just a mood

Answer instability — repeated change in the identity assigned to the same business across comparable prompts — matters because it can expose weak public signals that one clean answer hides. This working definition keeps the review away from a familiar distraction. Generated text always varies. The lab is not interested in every change of phrasing. It is interested in changes of name, place, branch, category, source support or current status.

A stable answer may rewrite its sentences while preserving the same identity. An unstable answer changes the business it appears to know. That difference is easy to miss when a marketer tests one prompt, saves one satisfying paragraph and stops. The first answer may have taken the best possible source path by chance. The next may follow a weaker path and attach the company to the wrong category.

The lab’s method treats one answer as an observation, not a conclusion. Several observations are needed before the team names a pattern. This is not caution for its own sake. In Italian business queries, the public evidence is often layered: legal names, trade names, surnames, maps, directories, tourism pages, commerce profiles, branch records and reviews. A single run may touch one layer. The next may touch another.

That is why the AI answer record includes prompt, answer, query language, visible citations, source path, business identity, location assignment, category assignment and mismatch. Repetition gives those fields a reason to exist. Without repeated records, instability remains a feeling: “the model changed its mind.” With records, the team can see which part moved.

The source path may rotate under the same prompt

In repeated runs, the most visible change is often citation rotation. One answer cites an owned page. Another cites a directory. A third cites a travel or commerce surface. Even if the prompt is similar, the model may assemble the answer through a different route.

The lab does not assume that every platform exposes its full retrieval behavior. Sometimes the source path is visible. Sometimes it is implied. Sometimes no path can be identified. Still, comparing cited or suggested evidence across runs often reveals why the identity shifts. A directory may carry an old branch. A commerce profile may use broader category wording. An owned Italian page may preserve the legal name but say less about visitor-facing services.

The design-retail composite is useful here because each source surface has a different temptation. The owned Italian page supports the company as a retailer with a particular identity. The English commerce profile makes the business easier for foreign users to place, but it broadens the category. A reseller page introduces product language that can sound like manufacturing. An outdated directory gives a neat label but may not match the current company description.

When the source path rotates, the answer may appear unstable even though the public web has not changed. The model is not necessarily hallucinating from empty air. It is selecting a different public fragment and letting that fragment carry more identity weight than it should. The instability becomes a source-preference problem.

This is one reason the lab avoids overreacting to a single wrong answer. A bad run can be noise. A repeated return to the same weak source type is more serious. It suggests the business has an identity surface that is easier for models to reuse than the one the business would prefer.

Which part of identity slips?

Vetro Source Lab uses the classification anchor from its canon to sort the movement: named correctly, placed by proxy, categorized by borrowed wording, cited through a weak source. In repeated runs, the question becomes dynamic. Which of these holds steady, and which keeps changing?

A business may be named correctly in every answer. That can create false comfort. If the place assignment drifts from Milan to Lombardy to a nearby province, the name anchor is strong but the place signal is weak. If the category shifts from retailer to manufacturer to design studio, the category anchor is unstable. If the citation changes from an owned page to an old directory to a reseller page, support is rotating under the claim.

This typology is qualitative. It does not rank errors by severity, and it does not measure frequency. Its value is practical. It turns “AI is inconsistent” into a sharper question: is the instability about name, place, branch, category or source support?

In the restaurant-group composite, the repeated runs produce a different shape. The family name holds across answers. The branch boundary slips. One answer describes the historic location. Another imports lake-facing review language from the newer branch. A third speaks about the group as if all branches share the same visitor profile. Here the instability is not category drift first. It is branch-scope drift.

In the design-retail composite, the category moves more than the place. The company remains broadly Italian and often Milan-associated, but its role changes. The model cannot decide whether it has retrieved a retailer, design brand, manufacturer, reseller network or broad home-furnishing source. That movement points to mixed category wording across public surfaces.

The lab’s position is that instability should be read by the part that moves. Otherwise all errors blur into one complaint. A wrong province, a wrong category and a weak citation are different failures even when they appear inside the same paragraph.

Italian and English runs do not wobble the same way

Language path is one of the strongest reasons repeated answers diverge. Italian prompts and English prompts may retrieve different evidence surfaces for the same business. Even within the same language, small changes in phrasing can pull the model toward a different public layer.

An Italian query using the official name may follow local signals: legal label, owned page, map listing, regional directory. An English query may find visitor-facing descriptions, commerce profiles, old translated entries or reseller pages. If those surfaces agree, the answer may stay stable. If they differ, repeated runs expose the fracture.

The lab does not treat English as merely a distorted copy of Italian. English pages can contain useful, careful explanations. They may be the only surfaces a foreign buyer or traveler can understand. The issue is that they often simplify. A precise Italian category can become a broad English commercial label. A province can become a more recognizable city. A branch can become a general brand description.

Repeated Italian-English comparison shows whether the instability belongs to the model alone or to the public evidence around the business. If Italian runs repeatedly identify the company one way and English runs another, the language surfaces are probably not aligned. If both languages wobble in similar fashion, the identity signals may be weak across the board.

This is where the lab’s work becomes useful for owners and SEO leads. They do not need to know every internal mechanism of the system to see that their public record is offering several plausible identities. The repeated runs become a mirror with a crack in the same place each time.

What repeated runs can show a business

A single AI answer can create a dramatic reaction: relief, alarm, annoyance, pride. Repeated runs cool that reaction down. They make the business look at patterns instead of impressions.

For a practical self-review, the lab would record the same prompt several times with small controlled variants: exact name, name plus city, name plus category, Italian query, English query, branch-specific query, recommendation-style query. The point is not to produce a huge dataset. The point is to see whether the same identity mismatch returns.

If the same wrong category appears across several prompts, the category signal likely needs review. If the wrong province appears only in English travel phrasing, the business may need clearer English place signals or correction of visitor-facing profiles. If citations keep coming from old directories, the current owned pages may not be the easiest citable sources for the claim. If every run changes in a different way, the conclusion is weaker: the public evidence may be too thin, or the model behavior too opaque for a confident pattern.

Repeated runs also protect against overfitting. A business should not rewrite its site because one model once chose a strange phrase. It should look for repeated pressure. The lab’s canon is strict on this point: a conclusion is stated only when several observations show the same pattern, and forecasts remain conditional.

The practical output is not “the AI likes this page.” It is a list of identity fields that held and fields that slipped. Name held; category drifted. Place held; branch boundary slipped. Category held; citation support weakened. That record is less exciting than a dashboard and more honest.

Limits of repeated-run interpretation

Repeated runs do not reveal every internal cause of an answer. Models may change retrieval behavior, source access, ranking habits or answer style in ways the lab cannot observe directly. A repeated prompt may also differ subtly because of interface behavior, browsing availability or context. The lab therefore treats source paths as visible or implied, never as perfectly known unless the evidence is plain.

The method also does not prove that a future wording change will remove instability. Clearer public signals may reduce confusion if they become consistent and easier to cite, but no lab observation can force a model to retrieve them. Forecasts stay conditional for that reason.

There is another limit: instability can be real without being important. If one run says “shop” and another says “retailer,” the business identity may still hold. If one run changes a neighborhood detail that no customer relies on, the practical risk may be low. The lab is interested in instability that changes name, place, branch, category, status or citation support.

The strongest use of repeated runs is therefore modest. They do not produce certainty. They show where certainty would be premature. For Italian businesses with layered names, bilingual pages, branch records and old listings, that is already valuable. A smooth answer can hide a weak identity. Repetition makes the weak joint move.

Ehsaneddin Asgari
responsible for the record
Vetro Source Lab · Italy · March 27, 2026